A Machine Learning-Based Clinical Decision Support System To Identify Prescriptions With A High Risk of Medication Error
A Machine Learning-Based Clinical Decision Support System To Identify Prescriptions With A High Risk of Medication Error
A Machine Learning-Based Clinical Decision Support System To Identify Prescriptions With A High Risk of Medication Error
doi: 10.1093/jamia/ocaa154
Advance Access Publication Date: 27 September 2020
Research and Applications
1
Pharmacy Department, Groupe Hospitalier Paris Saint Joseph, Paris, France, 2Lumio Medical, Paris, France, 3Centre National
edicament, Paris, France, 4Pharmacy Department, Hospices Civils de Lyon University Hospital,
Hospitalier d’Information sur le M
Lyon, France, Groupe Hospitalier Paris Saint Joseph, Paris, France, 6Medical Information Department, Groupe Hospitalier Paris
5
Corresponding author: Jennifer Corny, PharmD, Pharmacy Department, Groupe Hospitalier Paris Saint Joseph, 185 rue
Raymond Losserand, 75014 Paris, France; jcorny@hpsj.fr
Received 3 April 2020; Revised 10 June 2020; Editorial Decision 19 June 2020; Accepted 30 June 2020
ABSTRACT
Objective: To improve patient safety and clinical outcomes by reducing the risk of prescribing errors, we tested
the accuracy of a hybrid clinical decision support system in prioritizing prescription checks.
Materials and Methods: Data from electronic health records were collated over a period of 18 months. Inferred
scores at a patient level (probability of a patient’s set of active orders to require a pharmacist review) were cal-
culated using a hybrid approach (machine learning and a rule-based expert system). A clinical pharmacist ana-
lyzed randomly selected prescription orders over a 2-week period to corroborate our findings. Predicted scores
were compared with the pharmacist’s review using the area under the receiving-operating characteristic curve
and area under the precision-recall curve. These metrics were compared with existing tools: computerized
alerts generated by a clinical decision support (CDS) system and a literature-based multicriteria query prioritiza-
tion technique. Data from 10 716 individual patients (133 179 prescription orders) were used to train the algo-
rithm on the basis of 25 features in a development dataset.
Results: While the pharmacist analyzed 412 individual patients (3364 prescription orders) in an independent val-
idation dataset, the areas under the receiving-operating characteristic and precision-recall curves of our digital
system were 0.81 and 0.75, respectively, thus demonstrating greater accuracy than the CDS system (0.65 and
0.56, respectively) and multicriteria query techniques (0.68 and 0.56, respectively).
Discussion: Our innovative digital tool was notably more accurate than existing techniques (CDS system and
multicriteria query) at intercepting potential prescription errors.
Conclusions: By primarily targeting high-risk patients, this novel hybrid decision support system improved the
accuracy and reliability of prescription checks in a hospital setting.
Key words: supervised machine learning, electronic prescribing, clinical pharmacy information systems, medication errors,
decision support systems, clinical
C The Author(s) 2020. Published by Oxford University Press on behalf of the American Medical Informatics Association.
V
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/),
which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact
journals.permissions@oup.com 1688
Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 11 1689
INTRODUCTION is making prediction at the patient level, rather than through predic-
tions about individual prescription orders.
Medical errors are a major public health problem and a leading
cause of mortality. With some 250 000 deaths per year in the United
States, medical errors now rank after heart disease and cancer as the
MATERIALS AND METHODS
third leading cause of death.1 Even back in 1999, the Institute of
Medicine highlighted the need for technologies to prevent the esti- Setting
mated 44 000 to 98 000 annual deaths resulting from medical The study was conducted in a large, private, nonprofit hospital (592
errors.2 The problem is global, and the findings from the United beds) in Paris that provides both surgical and medical activities.
States are readily supported by data from other countries that have With the exception of neonatology and intensive care units that rely
also reported substantial rates of health care–related adverse on a dedicated software program, all patient files are typically digi-
events.3–7 During hospitalization, the majority of adverse events are tized and recorded using DxCare medical software (Dedalus, Le
Preprocessing Comparators
The binary classifier used 25 features engineered from heteroge- Two other prioritization processes were used as comparators to
neous inputs: numerical quantities, date/time objects, categorical evaluate the performance of our algorithm: one based on patient-
values, and natural language open text fields. During preprocessing, related data (the multicriteria query), and the other based on medi-
all numerical features were calibrated for outliers, standardized and cation orders (CDS alert system). The latter makes use of a certified
imputed, while categorical features were encoded, using scikit-learn drug database to provide alerts after analyzing patient prescription
and scikit-learn machine learning–compatible libraries. orders. The CDS alert system thus raised alerts relating to drug inter-
actions, dosage errors, and contraindications for renal insufficiency.
Testing protocol We also used a multicriteria query strategy based on 4 easily
We tested the performance of the tool on a separate validation data- available and widely recognized criteria to target high-risk patients
set (independent of the one used for model development). The accu- (ie, age, renal function [glomerular filtration rate], serum potassium,
racy of the algorithm was assessed and compared with the and international normalized ratio).17,18 The score was determined
prescription orders reviewed by pharmacists and also with classic using the following thresholds with the calculation of the number of
techniques: a CDS alert system and a multicriteria query approach. alerts raised:
• Age
Methodology • >75 years: 1
Over a 2-week period, a fully trained clinical pharmacist routinely • Other: 0
analyzed randomly selected patient prescription orders on all wards • Estimated glomerular filtration rate (Modification of Diet in Re-
and made a note of the interventions that followed. The selection of
nal Disease) formula:
prescription orders came from an automatic daily extraction from • <30 mL/min: 1
the medical software documenting all patients who had at least 1 • Other: 0
drug prescription. The pharmacist reviewed as many patients as pos- • Serum potassium level
sible over the 2-week period. • <3 mmol/L: 1
The data scientists were blinded to the actual pharmaceutical • >5 mmol/L: 1
interventions. Data relating to these prescription orders were then • Other: 0
used as inputs for the algorithm, which was thus tested on all wards. • International normalized ratio
All predicted scores (a continuous variable: probability for a pre- • <5: 0
scription order to contain errors) were then compared with the bi- • Other: 1
nary score: 1 ¼ a pharmaceutical intervention was carried out
during medication review; 0 ¼ no pharmaceutical intervention. For example:
For drug-related problems that were not intercepted by the tool • a 40-year-old patient with no other criteria: score, 0
(false negatives), a group of 2 physicians and 2 pharmacists ranked • an 80-year-old patient with an international normalized ratio of
the level of severity (from 1 [minor] to 4 [life-threatening]). To de- 4: score, 1
termine this risk, the patient’s file was reviewed and the potential • a 76-year-old patient with a glomerular filtration rate of 27 mL/
immediate or midterm harm was assessed. min: score, 2
Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 11 1691
Ethics approval For continuous scoring, such as the output of the hybrid algo-
Institutional review board approval was obtained from the local rithm (a probability), recall and precision were calculated by select-
ethics committee. Considering the type of study, international re- ing the classification threshold that maximizes the F1 score.
view board approval was not required. The decision support algorithm outperformed classic systems in
its capacity to both detect patients with a medication error (recall,
also known as sensitivity), and to limit the number of false alerts
RESULTS (precision, also called the positive predictive value).
Figures 1 and 2 show the results with regard to AUCPR and
Data collection and development
AUROC.
Over an 18-month period, data were collected on 94 720 hospital-
izations and a total of 61 611 patients (mean length of stay, 4.1
days; mean age, 69 years; female/male, 49.8%/50.2%), with a mean Accuracy of medication review using the algorithm
Table 1. Comparative performance of classic prescription order analysis tools versus the Lumio Medication algorithm in terms of recall, pre-
cision, and F1 scores, as well as AUCPR and AUROC
AUCPR: area under the precision-recall curve; AUROC: area under receiver-operating characteristic curve; CDS: clinical decision support; CI, confidence
interval.
1692 Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 11
accepted decreased by 30% for each additional reminder received, adverse events, with interesting results.25,26 However, these only fo-
and by 10% for each 5% increase in the number of repeated alerts. cused on selected adverse events and did not consider all potential
Given that current day processes of medication review do not medication errors.
have the capacity to cover all medical prescription orders, the pro- Our hybrid decision support system combining machine learning
cess is in urgent need of improvement. Pharmaceutical interventions with a rule-based expert system was notably more accurate at
are still relatively scarce and therefore prioritization of medication detecting medication errors compared with other tools described in
reviews, based on the likelihood of drug-related problems and there- the literature. Two of 3 individual patient prescription orders
fore medication errors, is essential. This study provided compelling reviewed by our tool triggered a pharmaceutical intervention, a fig-
evidence of the accuracy of artificial intelligence in identifying those ure that compares very favorably with the 20% in our development
patients with the greatest risk of errors in their prescription orders dataset or approximately 17% in the study by Nguyen et al.17 The
(ie, prioritizing patients in whom medication review is justified). sensitivity of our tool was also significantly higher than that of the
Earlier studies identified several risk factors for medication errors, CDS alert system or multicriteria query techniques. The hybrid
including patient age, renal dysfunction, and the number of drugs model we have developed uses both knowledge-driven (expert sys-
prescribed, to help pharmacists target interventions more effectively. tem) and data-driven (machine learning) approaches. It can there-
However, we previously found that these risk factors accounted for fore be expected to overcome the main shortcomings of both these
only 34% of the variations in the number of pharmaceutical inter- techniques by (1) not overfitting and consequently reproducing the
ventions.16 A multicriteria model-based strategy was also developed same error patterns that occurred in the development dataset; (2)
to identify patients whose prescription orders presented a high risk addressing the issue of certain infrequent though critical medical
of containing errors. This model was based on 11 predictors, of errors such as the so-called never-events, that is, serious incidents
which patient age and the number of drugs on the prescription were that are wholly preventable, as highlighted by the French Agency for
the most significant, with a C-statistic of 0.72.17 Nevertheless, of a the Safety of Health Products; and (3) reducing the number of false
total of 303 individual patients, 6 still needed to be reviewed for a positive alerts typically seen with tools such as CDS alert systems.
drug-related problem to be detected, demonstrating the need for in- This tool can also be easily adjusted by the addition of specific
novative approaches to make this activity more effective. Other rules to account for noisy or conflicting categories that the algorithm
studies reported a C-score model to detect only previously identified has not yet learnt to deal with.
Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 11 1693
One recently published study presented the results of an outlier cation review over the 18-month data collection and development
detection machine learning–based tool in a real-life setting that period; and (2) input with regard to pharmaceutical interventions
exhibited 89% accuracy in terms of the alerts raised.27 However, for the validation dataset and the development dataset came from
only 0.4% of the prescription orders generated an alert, whereas in different wards.
our study, 6.3% of prescription orders were associated with a phar- The next step is to deploy our system throughout other hospitals,
maceutical intervention. In addition, the authors did not report any thus extending the patient population covered. This will also enable
data on the sensitivity of this tool; it is therefore legitimate to specu- us to benefit from the experience of a greater number of clinical
late that because the system focused solely on outlier detection, pharmacists to confirm our findings. Adjustments are currently be-
other common medication errors likely went undetected. ing made to the algorithm to integrate unstructured data to further
The capacity of our hybrid system to constantly be adjusted as improve the performance of this tool. As an example, while the algo-
more prescription orders are reviewed by clinical pharmacists and rithm used in this study does not yet identify wrong-patient errors,
more potential error patterns identified gives it a significant advan- adjustments presently underway will enable it to address potential
tage over existing CDS systems. errors on medical notes (free text) associated with CPOE. Finally, a
Our findings are currently limited in scope in as far as the study real-life evaluation is currently being conducted to assess the perfor-
was conducted in a single hospital setting. In addition, neonatology mance of this tool in daily medication reviews.
and intensive care unit patients were not included because they are
managed by a different medical software. Consequently, evidence of
the accuracy of the algorithm to identify prescription order errors in
these units has yet to be demonstrated and our results can therefore
CONCLUSION
not be applied to these patients. Importantly, more pharmaceutical A hybrid machine learning–based decision support system has been
interventions were recommended during the test phase than in the developed to intercept prescription orders with a high risk of con-
development dataset. There are 2 possible explanations: (1) in the taining at least 1 medication error. Given that it is based on machine
validation dataset, prescription orders were reviewed by an experi- learning– and rule-based alerts, this decision support system has the
enced clinical pharmacist, whereas several pharmacists (junior and advantage of not overfitting errors, of decreasing alert fatigue, and
senior) with different levels of experience were involved in the medi- also of addressing infrequent but nevertheless potentially critical
1694 Journal of the American Medical Informatics Association, 2020, Vol. 27, No. 11
errors. The positive clinical implication of this model, which has nationale-sur-les-evenements-indesirables-graves-associes-aux-soins
demonstrated superior accuracy to both CDS alert systems and mul- Accessed October 31, 2019.
ticriteria query prioritization strategies, is the 3-fold improvement in 8. Brennan TA, Leape LL, Laird NM, et al. Incidence of adverse events and
negligence in hospitalized patients. Results of the Harvard Medical Prac-
the efficacy and productivity of medication review in a hospital set-
tice Study I. N Engl J Med 1991; 324 (6): 370–6.
ting and thus greater patient safety.
9. Bates DW, Boyle DL, Vander Vliet MB, et al. Relationship between medi-
cation errors and adverse drug events. J Gen Intern Med 1995; 10 (4):
199–205.
FUNDING 10. Bobb A, Gleason K, Husch M, et al. The epidemiology of prescribing
This research received no specific grant from any funding agency in the pub- errors: the potential impact of computerized prescriber order entry. Arch
lic, commercial or not-for-profit sectors. The algorithm developed by Lumio, Intern Med 2004; 164 (7): 785–92.
and that will soon be available on the market for all hospitals, was provided 11. Villama~ nan E, Larrubia Y, Ruano M, et al. Potential medication errors as-
without cost to the hospital pharmacy to enable the tool to undergo rigorous sociated with computer prescriber order entry. Int J Clin Pharm 2013; 35