Academia.eduAcademia.edu

Random forest-based prediction of stroke outcome

Scientific Reports

We research into the clinical, biochemical and neuroimaging factors associated with the outcome of stroke patients to generate a predictive model using machine learning techniques for prediction of mortality and morbidity 3-months after admission. The dataset consisted of patients with ischemic stroke (IS) and non-traumatic intracerebral hemorrhage (ICH) admitted to Stroke Unit of a European Tertiary Hospital prospectively registered. We identified the main variables for machine learning Random Forest (RF), generating a predictive model that can estimate patient mortality/morbidity according to the following groups: (1) IS + ICH, (2) IS, and (3) ICH. A total of 6022 patients were included: 4922 (mean age 71.9 ± 13.8 years) with IS and 1100 (mean age 73.3 ± 13.1 years) with ICH. NIHSS at 24, 48 h and axillary temperature at admission were the most important variables to consider for evolution of patients at 3-months. IS + ICH group was the most stable for mortality prediction [0.904 ...

www.nature.com/scientificreports OPEN Random forest‑based prediction of stroke outcome Carlos Fernandez‑Lozano1,2, Pablo Hervella3, Virginia Mato‑Abad4, Manuel Rodríguez‑Yáñez5, Sonia Suárez‑Garaboa4, Iria López‑Dequidt5, Ana Estany‑Gestal6, Tomás Sobrino3, Francisco Campos3, José Castillo3, Santiago Rodríguez‑Yáñez4* & Ramón Iglesias‑Rey3* We research into the clinical, biochemical and neuroimaging factors associated with the outcome of stroke patients to generate a predictive model using machine learning techniques for prediction of mortality and morbidity 3‑months after admission. The dataset consisted of patients with ischemic stroke (IS) and non‑traumatic intracerebral hemorrhage (ICH) admitted to Stroke Unit of a European Tertiary Hospital prospectively registered. We identified the main variables for machine learning Random Forest (RF), generating a predictive model that can estimate patient mortality/ morbidity according to the following groups: (1) IS + ICH, (2) IS, and (3) ICH. A total of 6022 patients were included: 4922 (mean age 71.9 ± 13.8 years) with IS and 1100 (mean age 73.3 ± 13.1 years) with ICH. NIHSS at 24, 48 h and axillary temperature at admission were the most important variables to consider for evolution of patients at 3‑months. IS + ICH group was the most stable for mortality prediction [0.904 ± 0.025 of area under the receiver operating characteristics curve (AUC)]. IS group presented similar results, although variability between experiments was slightly higher (0.909 ± 0.032 of AUC). ICH group was the one in which RF had more problems to make adequate predictions (0.9837 vs. 0.7104 of AUC). There were no major differences between IS and IS + ICH groups according to morbidity prediction (0.738 and 0.755 of AUC) but, after checking normality with a Shapiro Wilk test with the null hypothesis that the data follow a normal distribution, it was rejected with W = 0.93546 (p‑value < 2.2e−16). Conditions required for a parametric test do not hold, and we performed a paired Wilcoxon Test assuming the null hypothesis that all the groups have the same performance. The null hypothesis was rejected with a value < 2.2e−16, so there are statistical differences between IS and ICH groups. In conclusion, machine learning algorithms RF can be effectively used in stroke patients for long‑term outcome prediction of mortality and morbidity. Stroke is the second leading cause of death and the third leading cause of disability in the world. Approximately 15 million people will experience a stroke episode every year worldwide of which 33% is left with a permanent disability, whereas 40% of the cases will result in death, and by 2030 will result in the annual loss of over 200 million (death or disability) globally1. Developing an appropriate long-term management plan and studying the progress in the management of stroke patients is necessary in order to organize healthcare structures for the coming years1–5. Predicting functional outcome after stroke would help clinicians make patient-specific decisions6–10. Machine learning (ML) provides a promising tool for disease evolution prediction and it is being increasingly used in biomedical studies. The application of ML in healthcare is widely anticipated as a key step towards improving care quality, and would play a fundamental role in the development of learning healthcare systems. Large scale studies in the general literature provide evidence in favor of some classier families such as Random 1 Department of Computer Science and Information Technologies, Faculty of Computer Science, CITIC-Research Center of Information and Communication Technologies, Universidade da Coruña, A Coruña, Spain. 2Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Instituto de Investigación Biomédica de A Coruña (INIBIC). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, A Coruña, Spain. 3Clinical Neurosciences Research Laboratory (LINC), Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain. 4Software Engineering Laboratory, Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruña, Campus de Elviña, 15071 A Coruña, Spain. 5Stroke Unit, Department of Neurology, Health Research Institute of Santiago de Compostela (IDIS), Hospital Clínico Universitario, Rúa Travesa da Choupana, s/n, 15706Santiago de Compostela, Spain. 6Unit of Methodology of the Research, Health Research Institute of Santiago de Compostela (IDIS), Santiago de Compostela, Spain. *email: santiago.rodriguez@udc.es; ramon.iglesias.rey@sergas.es Scientific Reports | (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 1 Vol.:(0123456789) www.nature.com/scientificreports/ IS + ICH (n = 6022) IS (n = 4922) ICH (n = 1100) Demographic variables Age (years) 72.1 ± 13.7 71.9 ± 13.8 73.3 ± 13.1 Female gender (%) 44.5 44.8 41.5 Arterial hypertension (%) 66.7 63.7 60.7 Diabetes mellitus (%) 23.4 24.1 20.4 Alcohol consumption (%) 12.2 11.5 15.4 Smoking (%) 15.4 16.4 10.7 Dyslipidemia (%) 35.4 35.1 36.7 Peripheral arterial disease (%) 5.7 5.9 4.6 Ischemic heart disease (%) 10.8 11.3 8.6 Atrial fibrillation (%) 20.8 24.1 18.1 Previous transient ischemic attack (%) 5.4 6.1 2.5 Previous ischemic stroke (%) 13.1 13.6 9.8 Previous intracerebral hemorrhage,% 2.4 0.9 9.8 Previous anticoagulants (%) 9.5 8.5 14.1 Previous platelet antiaggregants (%) 22.9 24.4 16.5 Table 1. Demographic variables of the experimented dataset of patients summarized by group. Forest (RF) in terms of classification performance11,12. RF has recently been used successfully in a wide range of biomedical applications, such as the automatic detection of pulse during electrocardiogram-based cardiopulmonary resuscitation or in breast cancer diagnosis using mammography images13–18. As to stroke, most studies focus on the use of ML methods to detect ischemic stroke (IS) lesions using neuroimaging data19–23 and outcome estimation24–28. It has only been recently, however, that a study evaluated stroke outcome prediction at 3 months also in a group of non-traumatic intracerebral hemorrhage (ICH) patients using a nationwide disease registry27. Previous studies concluded that ML techniques can be effective to predict functional outcome of IS long-term patients or for prediction of symptomatic intracranial haermorrahe following thrombolysis from CT images. However, all works agree on the need to carry out further studies in order to confirm results, incorporate new variables and resolve their limitations or weaknesses. Taking into account the prevalence of cerebrovascular diseases, accurately predicting stroke evolution is essential to stratify the rehabilitation care that should be administered, especially to patients with the best chance of recovery. The administration of rehabilitative therapies to those who are unlikely to benefit from them is inefficient for the Healthcare System and inconvenient and unproductive for patients. A predictive model that identifies stroke patients at risk of deterioration would make it possible to select/follow-up patients for reperfusion treatments, and increase the control of therapeutic homeostasis, thus addressing the needs of each patient individually. Furthermore, regarding to new regenerative cellular or molecular therapies, it is essential to identify the most suitable patients to respond accurately to treatments. We hypothesized that models developed with ML techniques based on the demographic, clinical, biochemical and neuroimaging variables obtained in the first 48 h after stroke are accurate stroke mortality and morbidity predictors at 3 months. Results We included in the study 6022 patients; 4922 (81.8%) presented with IS and 1100 (18.2%) with ICH. We excluded 228 patients, who died during the first 24 h, and 84 with no follow-up at 3 months. The 65 features of the different groups included in the experimented dataset are shown in Tables 1, 2 and 3. Figure 1 lists flowchart of patient groups with their functional outcome and divided into morbidity and mortality. Of the 4922 IS patients valid for this study, 55.2% were male and 44.8% female; the mean age was 71.9 ± 13.8 years. According to the TOAST classification, 1127 patients were classified as atherothrombotic (22.9%), 1786 as cardioembolic (36.3%), 428 as lacunar (8.7%) and 1520 as undetermined (30.9%). Poor functional outcome at 3 months was found in 47.5% of IS patients, thus showing a morbidity of 33.4% and a mortality of 13.2%. Of the 1100 ICH patients, 58.5% were male and 41.5% female; the mean age was 73.3 ± 13.1 years. ICH etiology was related with 506 hypertensive patients (46%), 114 with amyloid angiopathy (10.4%), with 156 anticoagulants (14.2%) and 323 with other causes (29.4%). 58.6% of ICH patients showed poor outcome at 3 months, with a morbidity of 27.6% and a mortality of 30.2%. Using the filter feature selection, the dataset was reduced to only 7 variables: National Institute of Health Stroke Scale score at admission [NIHSS (0)]; NIHSS score at 24 h [NIHSS (24)]; NIHSS score at 48 h [NIHSS (48)]; Axillary temperature at admission [T(0)]; Early neurological deterioration [ED]; Leukocytes at admission [LEU (0)]; and blood glucose at admission [GLU (0)] as with these variables, RF was much more stable and deviations or variations between experiments could be reduced. Scientific Reports | Vol:.(1234567890) (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 2 www.nature.com/scientificreports/ IS + ICH (n = 6022) IS (n = 4922) ICH (n = 1100) Clinical/Neuroimaging variables Stroke on awakening (%) 8.3 9.1 4.6 Previous mRS 0 [0, 1] 0 [0, 1] 1 [0, 1] Time from stroke onset, minutes 239.1 ± 175.2 240.8 ± 167.4 231.3 ± 206.1 NIHSS score at admission 13 [7, 19] 13 [8, 19] 13 [7, 18] NIHSS score at 24 h 8 [13, 16] 7 [3, 15] 12 [6, 19] NIHSS score at 48 h 7 [2, 15] 6 [2, 14] 12 [4, 20] Early neurological deterioration (%) 7.7 5.8 16.5 TOAST Atherothrombotic (%) – 22.9 – Cardioembolic (%) – 36.3 – Lacunar (%) – 8.7 – Undetermined (%) – 30.9 – Others (%) – 1.2 – Intravenous fibrinolysis (%) – 22.7 – Thrombectomy (%) – 5.2 – DWI at admission (ml) – 33.3 ± 76.9 – TC volume 4th–7th day (ml) – 51.1 ± 82.3 – Hemorrhagic transformation of IS IH1 (%) – 7.0 – IH2 (%) – 3.1 – PH1 (%) – 1.7 – PH2 (%) – 1.2 – Etiology of ICH Hypertensive (%) – – 46.0 Amyloid (%) – – 10.4 Anticoagulants (%) – – 14.2 Others/Undetermined (%) – – 29.4 Hematoma volume at admission (ml) – – 40.3 ± 46.2 Hematoma volume 4th–7th day (ml) – – 51.9 ± 48.1 Total hematoma volume (ml) – – 68.3 ± 53.1 Volume of hypodensity (ml) – – 15.2 ± 17.9 Hematoma growth (ml) – – 11.9 ± 27.6 Topography Deep hemispherics (%) – – 50.0 Lobar (%) – – 39.6 Cerebellar (%) – – 4.7 Breinstem (%) – – 3.8 Primary intraventricular (%) – – 1.9 Axillary temperature at admission (ºC) 36.4 ± 0.7 36.4 ± 0.7 36.6 ± 0.8 Blood glucose at admission (mg/dl) 137.6 ± 56.3 137.3 ± 57.9 138.9 ± 48.1 Sedimentation rate (mm) 26.4 ± 23.1 26.5 ± 23.1 26.2 ± 23.1 Glycosylated hemoglobin (%) 6.1 ± 2.1 6.1 ± 2.3 5.8 ± 0.9 LDL cholesterol (mg/dl) 101.9 ± 42.9 112.5 ± 44.4 109.6 ± 35.2 HDL cholesterol (mg/dl) 41.2 ± 18.5 41.8 ± 18.5 38.8 ± 18.3 Triglycerides (mg/dl) 118.3 ± 63.1 121.2 ± 65.1 109.4 ± 50.7 Platelets (× 10 3/ml) 215.4 ± 82.9 217.7 ± 83.7 203.3 ± 77.9 Hemoglobin (g/dl) 13.7 ± 1.9 13.8 ± 1.9 13.5 ± 2.1 DBP at admission (mmHg) 81.9 ± 16.1 81.5 ± 15.8 84.3 ± 17.2 SBP at admission (mmHg) 152.9 ± 27.3 152.5 ± 27.3 155.5 ± 27.4 Table 2. Clinical and neuroimaging variables of the experimented dataset of patients summarized by group. Prediction of mortality. Figure 2A shows the most important variables for the model associated to IS, ICH or IS + ICH patient groups in relation to the mortality prediction. The value shown is, in all cases, the sum of the Scientific Reports | (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 3 Vol.:(0123456789) www.nature.com/scientificreports/ IS + ICH (n = 6022) IS (n = 4922) ICH (n = 1100) Molecular markers Leukocytes at admission (× 103/ml) 8.9 ± 3.1 9.1 ± 3.2 8.8 ± 3.3 Fibrinogen at admission (mg/dl) 443.9 ± 101.7 444.5 ± 101.8 444.1 ± 101.5 C-reactive protein admission (mg/dl) 2.7 ± 3.8 3.6 ± 4.2 5.2 ± 5.2 Microalbuminuria (mg/24 h) 7.9 ± 26.2 5.9 ± 25.9 16.7 ± 30.0 NT-pro-BNP levels (pg/ml) 915.9 ± 1563.7 1581.2 ± 1886.1 1013.8 ± 3620.2 Outcome at 3 months mRS 2 [1, 4] 2 [0, 4] 3 [1, 6] Poor outcome (%) 49.6 47.5 58.6 Morbidity (%) 35.0 33.4 27.6 Mortality (%) 16.3 13.2 30.2 Table 3. Molecular markers and outcome at 3 months of the experimented dataset of patients summarized by group. Figure 1. Flowchart of patient groups and functional outcome. importance obtained by the algorithm for the variable in each of the experiments internally. The most important variables, taking into account the three groups of patients analyzed, were NIHSS (48) and NIHSS (24). In the IS and ICH patient groups, the importance of ED, T (0) and NIHSS (0) should be highlighted. NIHSS (0) was also observed to be more important in patients with ICH than in those with IS when the models do not have data from both types of patients. It seems, however, that its importance is significantly reduced when the model has the complete set. Finally, LEU (0) and GLU (0) are variables that help balance the results for the complete model, reducing the variability of the individual IS or ICH models among all variables. The variation obtained between all the repetitions performed in area under the receiver operating characteristics (ROC) curve (AUC) terms is detailed in Fig. 2B,C for the three experiments performed. The complete problem with two types of patients is the most stable with a minimum deviation between experiments (median of 0.904 ± 0.025 of AUC and 0.825 ± 0.030 of accuracy (ACC)). On the other hand, the ICH problem is the one in which RF has more problems to make adequate predictions as the range of results varies in more than 20 AUC points between the best (0.9837 of AUC with 0.94 of ACC) and the worst experiment (0.7104 of AUC and 0.6122 of ACC) and values for 100 repetitions of 0.875 ± 0.048 of AUC and 0.8 ± 0.052 of ACC. This prediction is therefore the most complex for the model. As to the IS problem, RF presented similar values to those of the IS-ICH prediction problem although variability between experiments is slightly higher (0.909 ± 0.032 of AUC and 0.833 ± 0.040 of ACC). This led us to Scientific Reports | Vol:.(1234567890) (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 4 www.nature.com/scientificreports/ Figure 2. Mortality prediction for IS + ICH, IS and ICH groups. (A) Main variables for the machine learning model: NIHSS score at admission [NIHSS (0)]; NIHSS score at 24 h [NIHSS (24)]; NIHSS score at 48 h [NIHSS (48)]; Axillary temperature at admission [T(0)]; Early neurological deterioration [ED]; Leukocytes at admission [LEU (0)]; and Blood glucose at admission [GLU (0)]. (B) AUROC values obtained. (C) ROC curves for the Random Forest classifier. conclude that there is enough information within the selected variables so that when RF has enough patients in the dataset, the model predicts very accurately which patients are most likely to die on the basis of the data collected at admission. It was also observed that when there is a greater amount of data and patients are stratified into the three categories, the model is much more stable, and results are better. A 2D-heatmap of mortality predictions against NIHSS (48) and NIHSS (24) is detailed in Fig. 3 in order to explain the decision boundary of the model. Note that the misclassified items are highlighted and that the intensity of the colors also indicates the certainty of the prediction. We showed the IS + ICH group as it was the most stable for mortality prediction (0.904 ± 0.025 of AUC). Prediction of morbidity. Figure 4A shows that NIHSS (48) and NIHSS (24) are again the most important variables for the model associated to the three groups studied in relation to morbidity prediction. However, it seems that the variables ED for the IS and total groups, and GLU (0) for ICH patients provide relevant predictive capacity. NIHSS (0) was identified by the model as a variable with negative effects on classification for the IS patient group as it worsens prediction. For the ICH patient group, however, NIHSS (0) is a key variable. Figure 4B,C shows that there were no major differences between the IS and IS + ICH groups (0.738 and 0.755 of AUC and 0.683 and 0.700 of ACC) but with the ICH (0.667 of AUC and 0.618 of ACC). AUC with 7 input variables. Figure 5 shows the comparison of ROC curves for the most important variables of the ML model associated to the groups IS + ICH, IS or ICH patients in relation to the mortality and morbidity prediction. IS + ICH and IS patient groups curves revealed NIHSS (24) and NIHSS (48) variables with the best AUC values obtained for both morbidity [0.677 (CI 95% 0.662–0.692) vs. 0.703 (CI 95% 0.686–0.719); and 0.669 (CI 95% 0.654–0.684) vs. 0.697 (CI 95% 0.680–0.713)] and mortality [0.888 (CI 95% 0.876–0.900) vs. 0.892 (CI 95% 0.878–0.906); and 0.897 (CI 95% 0.885–0.908) vs. 0.899 (CI 95% 0.885–0.913)] prediction. The ICH group presented NIHSS (0) and NIHSS (24) with the best AUC values obtained for morbidity [0.591 (CI 95% 0.553–0.629) and 0.588 (CI 95% 0.550–0.626)]; and NIHSS (24) and NIHSS (48) for mortality [0.865 (CI 95% 0.838–0.891) and 0.873 (CI 95% 0.847–0.899)] estimation. Scientific Reports | (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 5 Vol.:(0123456789) www.nature.com/scientificreports/ Figure 3. 2D-heatmap of mortality (EXT) predictions against NIHSS(48) and NIHSS(24). Model results are shown for the IS + ICH group, as it was the most stable for mortality prediction (0.904 ± 0.025 of AUC). Red areas correspond to patients who do not die (0), blue areas correspond to patients who die (1), and misclassified items are highlighted. Discussion It is difficult but essential to accurately predict functional outcomes after stroke. Outcome prediction plays an important role in long-term decision making, patient treatment, organization of Health Centers, and domestic conditions. Plans could be developed on the basis of a better prediction of the degree of recovery of each patient with appropriate and individualized rehabilitation measures that take into account the domestic and economic conditions, leading to shared decisions with patients, relatives, and sociomedical centers27,29. The conventional approach to the evaluation of stroke outcomes data resorts to classical statistical models (logistic regression). Logistic regression models identify and validate predictive variables. Their main advantage is that they can be easily implemented and interpreted24. ML algorithms have the potential to outperform conventional regression because they are able to capture nonlinearities and complex interactions among multiple predictor variables. They can also handle large-scale multi-institutional data, with the added advantage of easily incorporating newly available data to improve prediction performance, and a better handling of a large number of predictors30. A recent systematic review found that ML was not superior to logistic regression in clinical prediction modeling31. However, inconsistent conclusions have been often found when comparing the performance of classical models to different machine learning algorithms in clinical applied studies. These studies agree that further research is needed to assess the feasibility and acceptance of ML applications in clinical practice32,33. Different research works have proposed strategies for stroke prediction based on ML algorithms with excellent results, but with a great diversity in the variables analyzed (clinical, molecular markers or imaging), calibration/training protocols performed, and models implemented (neural networks, tree-based and kernel-based methods). Some limitations of these previous studies are due to: (1) the low sample size used, (2) the characteristics of the patients evaluated; most of the studies only evaluate IS sub-groups, such as, IS patients treated with rTPA or endovascular intervention, (3) studies used demographic, clinical, molecular or neuroimaging variables independently and uncorrelated, and (4) the small number of variables used in ML models. Asadi et al. performed dichotomized modified Rankin Scale (mRS) models of acute ischemic stroke (n = 107) and presented 0.6 AUC of ANN and 70% Scientific Reports | Vol:.(1234567890) (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 6 www.nature.com/scientificreports/ Figure 4. Morbidity prediction for IS + ICH, IS and ICH groups. (A) Main variables for the machine learning model: NIHSS score at admission [NIHSS (0)]; NIHSS score at 24 h [NIHSS (24)]; NIHSS score at 48 h [NIHSS (48)]; Axillary temperature at admission [T(0)]; Early neurological deterioration [ED]; Leukocytes at admission [LEU (0)]; and Blood glucose at admission [GLU (0)]. (B) AUROC values obtained. (C) ROC curves for the Random Forest classifier. accuracy of SVM24. Bentley et al. developed an SVM model to study acute ischemic stroke patients (n = 116) at risk for symptomatic intracranial hemorrhage using CT brain images23. Monterio et al. applied ML techniques (RF, Xgbosst, SVM and Decision tree) to predict the functional outcome of ischemic stroke patients (n = 425) treated with Recombinant Tissue Plasminogen Activator (rtPA) 3 months after the initial stroke. They started using only the information available at admission and then they went on to analyze how prediction improves by adding more features collected at different points in time after admission. The ML approach achieved AUC of 0.808 when using the features available at admission and as new features were progressively added, AUC increased to a value above 0.9025. Heo et al. researched into the applicability of machine learning-based models with a prospective cohort of 2604 patients with acute ischemic stroke. The AUC obtained was 0.888 for DNN model, 0.857 for RF model, and 0.849 for logistic regression model26. Recently, Lin et al. used a Taiwan Stroke Registry (n = 40,293) to evaluate several ML approaches (SVM, RF, ANN, and HANN) for 90-day stroke outcomes prediction. ML techniques presented over 0.94 AUC in both ischemic and hemorrhagic stroke using preadmission and inpatient data27. Alaka et al. concluded that both logistic regression and ML models had comparable predictive accuracy at 90 days when validated internally (AUC range = [0.65–0.72]) and externally (AUC range = [0.66–0.71]) in acute IS patients after endovascular treatment (n = 614–684)28. In our study, we analyzed a ML model of stroke prediction at 3 months using the Hospital’s Stroke Registry (BICHUS) on the basis of demographic, clinical, molecular and neuroimaging variables. Mortality and morbidity were evaluated by identifying the main variables for the ML model. Data were studied as a whole (IS + ICH) or as independent subsets. Our ML classifiers exhibited high performance with over 0.90 AUC in the three groups evaluated in relation to the mortality outcome. The IS group had the best results (n = 4922). The model indicates that the most relevant variables are NIHSS (48) and NIHSS (24). In addition, the variable NIHSS (0) is also important for the ICH patients (n = 1100). The rest of the variables provide information marginally, although the importance of T(0) and ED should not be disregarded. On the other hand, AUC over 0.75 was found in the three groups evaluated in relation to the morbidity outcome. The model developed indicates that the most relevant variables are NIHSS (48) and NIHSS (24), although ED for the IS group and GLU (0) for the ICH Scientific Reports | (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 7 Vol.:(0123456789) www.nature.com/scientificreports/ Figure 5. Comparison of ROC curves of 7 variables selected for machine learning experiments for mortality and morbidity prediction at 3 months of the different patient groups evaluated. (A,B) Morbidity and mortality of IS + ICH group. (C,D) Morbidity and mortality of IS group. (E–F) Morbidity and mortality of ICH group. provide predictive capability. Compared to ROC curve analysis of the 7 input variables, ML classifier has a high performance in three groups, with the NIHSS (0), NIHSS (24) or NIHSS (48) as the most influential predictors. The findings in this report are subject to at least four limitations. First, this was a retrospective, single-center study with a relatively small clinical dataset. The intrinsic need for large training datasets may affect the accuracy of ML models in studies that could be overadjusted by irrelevant clinical predictors, or some predictors may be underestimated, thus increasing random errors. It is important to note that the variables selected in our study Scientific Reports | Vol:.(1234567890) (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 8 www.nature.com/scientificreports/ had been previously identified by means of a T-test and supervised by expert neurologists. However, in the future, both training and validation procedures will need to include a multicenter dataset and a prospective study to verify the model and the variables obtained. Second, the IS and the ICH patient groups were unbalanced. We consider, however, that the two types of stroke should be studied independently to find both differences and similarities. Third, we used RF machine learning algorithms, although other models like DNN or deep logistic regression could be used for comparison purposes. Four, we used typical clinical variables as inputs for the ML model, and we did not stratify patients in different subgroups, which could improve the results presented. We consider that it would be useful to evaluate the common variables from a clinical point of view, so once again emphasis is on the importance of NIHSS, axillary temperature and blood glucose. The major strengths of the present study include the large sample size (6022 patients; 4922 with IS and 1100 with ICH), which enabled study of the combination of different stroke types in detail (IS + ICH, IS, and ICH). Furthermore, to derive a global risk score for stroke, we have evaluated/interrelated demographic, clinical, biochemical and neuroimaging variables. Another distinctive feature of this analysis compared with previous studies is that we also included molecular markers associated with inflammation (leukocytes, fibrinogen and C-reactive protein), endothelial and atrial dysfunction (microalbuminuria and NT-proBNP). Conclusions Machine learning algorithms, particularly Random Forest, can be effectively used in long-term outcome prediction of mortality and morbidity of stroke patients. NIHSS at 24, 48 h and axillary temperature are the most important variables to consider in the evolution of the patients at 3 months. Future studies could incorporate the use of imaging and genetic information. Furthermore, the robust model developed could be used in other applications and different scopes with similar data; such as traumatic brain injury, or dementia (Alzheimer’s and Parkinson’s disease). Materials and methods Patient selection. The dataset used in this research work consisted of patients with IS and ICH admitted to the Stroke Unit of the Hospital Clínico Universitario of Santiago de Compostela (Spain), who were prospectively registered in an approved data bank (BICHUS). All patients were treated by a certified neurologist according to national and international guidelines. Exclusion criteria for this analysis were: (1) patients who died during the first 24 h, and (2) loss of follow-up (personal interview or telephone contact) at 3 months. The analysis of the data for this study was retrospective, from September 2007 to September 2017. This research was carried out in accordance with the Declaration of Helsinki of the World Medical Association (2008) and approved by the Ethics Committee of Santiago de Compostela (2019/616). All patients or their relatives signed the informed consent for inclusion in the registry and for anonymous use of their personal data for research purposes. Demographic, clinical, molecular and neuroimaging variables. The registry includes demographic variables, previous medical history and vital signs. Blood samples for hemogram, biochemistry and coagulation tests were obtained and analyzed at the central hospital laboratory. Neurological deficit was evaluated by a certified neurologist using the National Institute of Health Stroke Scale (NIHSS) at admission, and every 24/48 h during hospitalization. The modified Rankin Scale (mRS) was used to evaluate functional outcome at discharge and at 3 months33,34. Effective reperfusion of IS patients was defined as ≤ 8 points in the NIHSS during the first 24 h. Early neurological deterioration was defined as ≥ 4 points in NIHSS within the first 48 h with respect to baseline NIHSS score. Poor functional outcome was defined as mRS > 2 at 3 months, morbidity as 3 ≤ mRS ≤ 5, and mortality as mRS = 6. Ischemic stroke diagnosis was made using the TOAST criteria35. Computed Tomography (CT) was performed in all patients and Magnetic Resonance Imaging (MRI) in selected patients at admission. Follow-up CT scan after fibrinolysis or thrombectomy was performed in all IS patients at 24 h, and CT at 48 h or when neurological deterioration was detected and between the 4th–7th day. ICH and perihematomal edema volumes were calculated using the ABC/2 method36. ICH topography was classified as lobar when it predominantly affected the cortical/subcortical white matter of the cerebral lobes or as deep when it was limited to the internal capsule, the basal ganglia or the thalamus. All neuroimaging tests were analyzed by a neuro-radiologist supervised by the above certified neurologist. Outcome endpoints. The objective of this research work was to identify the main predictors for the machine learning model in order to generate a predictive model using machine learning techniques for the prediction of mortality and morbidity of stroke patients according to their stratification to one of the following groups: 1) IS + ICH, 2) IS, and 3) ICH. Machine learning. We used the RF algorithm for the prediction of mortality and morbidity of stroke patients. RF is an ensemble learning method, i.e., a strategy that aggregates many predictions to reduce the variance and improve the robustness and precision of outputs37–39. A remarkable characteristic of the RF is that it provides an internal measure of the relative importance of each feature on the prediction. This model generally works very well for any type of problem, regardless of size and even if the data are unbalanced or missing37. It also makes it possible to analyze the importance of the variables used by the model. To this end, the Gini importance index was calculated. This index measures the increase in impurity of each variable in the model when selected in the random distribution process. Each time a node selects a variable, the Gini impurity index for the two child Scientific Reports | (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 9 Vol.:(0123456789) www.nature.com/scientificreports/ nodes is lower than in the parent. It is not a simple final summation of the values obtained in all the trees for each variable but a weighting. Generally speaking, all ML algorithms have a number of hyperparameters that must be optimized to obtain the best results for the particular problem they are analyzing. We used R40,41 and the following packages: mlr42 to calculate the best number of trees (ranging from 500 to 1000); Random Forest43 or our experiments; and ggplot244 graphics for data analysis. Data pre‑processing. Balanced classes in classification problems are critical for ML algorithms. When analyzing the problem, we initially obtained prediction values in AUC lower than 0.65 in the best of cases using 65 features and four different ML algorithms, which is considered a bad performance value for prediction. This is mainly due to the fact that in our dataset there is a high percentage of patients who survived versus patients who died, and we found noise and correlation between features, confounding the predictors. These numbers show an unbalanced problem that needs to be addressed since a predictive model for patient death is being generated. The following are the two main approaches to balance the data: (1) oversampling the minority class or (2) undersampling the class where the data has more examples. Although these are very powerful techniques that are able to increase the performance of the classifiers, they must be handled cautiously, more so in medical problems, to prevent overadjustments or the loss of generation capacity in the models when new synthetic samples are included (oversampling) at the learning phase of the algorithms45. In this work, undersampling methods (random undersampling from the majority class) were assessed for class balancing purposes. To ensure that the undersampling process is fair and that the generalization capability of the models is not biased we ran 100 repetitions, each with a different random undersampling, of a tenfold cross-validation experiment to observe the behavior and the stability of results. The more stability the better the random removing process. This means that the remaining samples of the majority class captured the underlying knowledge of the class. The experimental design developed to analyze the original data included: a data preprocessing phase and the balancing of the subclasses; a tenfold cross-validation and 100 repetitions. For each of these repetitions, the position of each patient in the dataset was randomized. The preprocessed data were also randomized to avoid any potential process-related bias. The problem was broken down into six different but complementary and informative problems: mortality and morbidity prediction with IS, ICH or IS + ICH patients. This approach sought to analyze more exhaustively the differences between the different types of patients when predicting death/poor outcome and to analyze whether the variables with more weight in the prediction were the same in all the cases. In order to identify which of the 65 variables available are the most informative, we performed feature selection. There are mainly three different approaches for feature selection in machine learning: filter, wrapper and embedded46. Filter methods assess the relevance of each feature by looking only at the intrinsic properties of the data (independent of the algorithms). We calculated a feature relevance score (T-test) on the training data, and low-scoring features were removed choosing a manual cut-off point to reduce the variance of the models (see Supplementary Fig. S1 and Supplementary Material). Statistical analysis. For the descriptive study of the quantitative variables, results were expressed as percentages for categorical variables and as mean (SD) or median (quartiles) for the continuous variables, depending on whether their distribution was normal or not. The Kolmogorov–Smirnov test was used for testing the normality of the distribution. To measure the performance of the model we used the area under the receiver operating characteristics (ROC) curve (AUC or AUROC)47. To train and validate the model we used tenfold cross validation. AUC results are presented as mean ± SD calculated over the tenfold validation sets. To test whether an AUC of logistic regression and ML models prediction could obtain similar results, ROC curve analysis was used to compare the 7-input variables selected for ML experiments of the different patient groups as potential morbidity and mortality clinical markers at 3 months. The statistical descriptive analysis was conducted in SPSS 25.0 (IBM, Chicago, IL) for Mac. Data availability All data are available within the text of the manuscript. Further anonymized data could be made available to qualified investigators upon reasonable request. Received: 24 December 2020; Accepted: 26 April 2021 References 1. Neuhaus, A. A. et al. Neuroprotection in stroke: The importance of collaboration and reproducibility. Brain 140, 2079–2092 (2017). 2. Bramlett, H. M. & Dietrich, W. D. Pathophysiology of cerebral ischemia and brain trauma: Similarities and differences. J. Cereb. Blood Flow Metab. 24, 133–150 (2004). 3. Burns, J. D., Fisher, J. L. & Cervantes-Arslanian, A. M. Recent advances in the acute management of intracerebral hemorrhage. Neurosurg. Clin. N. Am. 29, 263–272 (2018). 4. Béjot, Y. et al. Epidemiology of stroke in Europe and trends for the 21st century. Presse. Med. 45, e391–e439 (2016). 5. Rodríguez-Castro, E. et al. Trends in stroke outcome in the last ten years in a European tertiary hospital. BMC Neurol. 18, 164 (2018). 6. Fens, M. et al. Multidisciplinary care for stroke patients living in the community: A systematic review. J. Rehabil. Med. 45, 321–330 (2013). 7. Sen, A. et al. Continuous hemodynamic monitoring in acute stroke: An exploratory analysis. Emerg. Med. 15, 345–350 (2014). 8. Elmaraezy, A. et al. Desmoteplase for acute ischemic stroke: A systematic review and metaanalysis of randomized controlled trials. CNS Neurol. Disord. Drug Targets 16, 789–799 (2017). Scientific Reports | Vol:.(1234567890) (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 10 www.nature.com/scientificreports/ 9. Lin, Y. et al. Endovascular thrombectomy for acute ischemic stroke: A meta-analysis. CNS Neurol. Disord. Drug Targets 314, 1832–1843 (2015). 10. Baratloo, A. et al. Effects of telestroke on thrombolysis times and outcomes: A meta-analysis. Prehosp. Emerg. Care 22, 472–484 (2018). 11. Olson, R. S. et al. Data-driven advice for applying machine learning to bioinformatics problems. Pac. Symp. Biocomput. 23, 192–203 (2018). 12. Deist, T. M. et al. Machine learning algorithms for outcome prediction in (chemo)radiotherapy: An empirical comparison of classifiers. Med. Phys. 46, 1080–1087 (2019). 13. Elola, A. et al. ECG-based pulse detection during cardiac arrest using random forest classier. Med. Biol. Eng. Comput. 57, 453–462 (2019). 14. Shaikhina, T. et al. Decision tree and random forest models for outcome prediction in antibody incompatible kidney transplantation. Biomed. Signal Process. Control 52, 456–462 (2019). 15. Liu, Y. et al. Experimental study and random forest prediction model of microbiome cell surface hydrophobicity. Expert. Syst. Appl. 72, 306–316 (2017). 16. Chowdhury, A. R., Chatterjee, T. & Banerjee, S. A random forest classifier-based approach in the detection of abnormalities in the retina. Med. Biol. Eng. Comput. 57, 193–203 (2019). 17. Dmitriev, K. et al. Classification of pancreatic cysts in computed tomography images using a random forest and convolutional neural network ensemble. Med. Image Comput. Assist. Interv. 10435, 150–158 (2017). 18. Alickovic, E. & Subasi, A. Breast cancer diagnosis using GA feature selection and rotation forest. Neural Comput. Appl. 28, 753–763 (2017). 19. Mitra, J. et al. Lesion segmentation from multimodal MRI using random forest following ischemic stroke. Neuroimage 98, 324–335 (2014). 20. Maier, O. et al. Extra tree forests for sub-acute ischemic stroke lesion segmentation in MR sequences. J. Neurosci. Methods 240, 89–100 (2015). 21. McKinley, R. et al. Fully automated stroke tissue estimation using random forest classifiers (FASTER). J. Cereb. Blood Flow Metab. 37, 2728–2741 (2017). 22. Subudhi, A. et al. Automated approach for detection of ischemic stroke using Delaunay Triangulation in brain MRI images. Comput. Biol. Med. 103, 116–129 (2018). 23. Bentley, P. et al. Prediction of stroke thrombolysis outcome using CT brain machine learning. NeuroImage Clin. 4, 635–640 (2014). 24. Asadi, H. et al. Machine learning for outcome prediction of acute ischemic stroke post intra-arterial therapy. PLoS ONE 9, e88225 (2014). 25. Monteiro, M. et al. Using machine learning to improve the prediction of functional outcome in ischemic stroke patients. IEEE/ ACM Trans. Comput. Biol. Bioinform. 15, 1953–1959 (2018). 26. Heo, J. et al. Machine learning-based model can predict stroke outcome. Stroke 49, A194A (2018). 27. Lin, C. H. et al. Evaluation of machine learning methods to stroke outcome prediction using a nationwide disease registry. Comput. Methods Programs Biomed. 190, 105381 (2020). 28. Alaka, S. A. et al. Functional outcome prediction in ischemic stroke: A comparison of machine learning algorithms and regression models. Front. Neurol. 11, 889 (2020). 29. Stinear, C. M. Prediction of recovery of motor function after stroke. Lancet Neurol. 9, 1228–1232 (2010). 30. Alpaydin E. Introduction to machine learning. Cambridge, Massachusetts (2010) 31. Christodoulou, E. et al. A systematic review shows no performance benefit of machine learning over logistic regression for clinical prediction models. J. Clin. Epidemiol. 110, 12–22 (2019). 32. Lynam, A. L., Dennis, J. M. & Owen, K. R. Logistic regression has similar performance to optimised machine learning algorithms in a clinical setting: Application to the discrimination between type 1 and type 2 diabetes in young adults. Diagn. Progn. Res. 4, 6 (2020). 33. Montaner, J. & Álvarez-Sabín, J. NIHSS Stroke Scale and its adaptation to Spanish. Neurologia 21, 192–202 (2006). 34. Bonita, R. & Beaglehole, R. Modification of rankin scale: recovery of motor function after stroke. Stroke 19, 1497–1500 (1998). 35. Adams, H. P. Jr. et al. Classification of subtype of acute ischemic stroke. Definitions for use in a multicenter clinical trial. TOAST Trial of org 10172 in acute stroke treatment. Stroke 24, 35–41 (1993). 36. Sims, J. R. et al. ABC/2 for rapid clinical estimate of infarct, perfusion, and mismatch volumes. Neurology 72, 2104–2110 (2009). 37. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001). 38. Breiman, L. Random forests. Mach. Learn. 24, 123–140 (1996). 39. Amit, Y. & Geman, D. Shape quantization and recognition with randomized trees. Neural. Comput. 9, 1545–1588 (1997). 40. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing. Vienna, Austria (2020). https://www.R-project.org/. 41. R Core Team. The R Project for Statistical Computing. Vienna, Austria (2019). 42. Bischl, B. et al. mlr: machine learning in R. J. Mach. Learn. Res. 17, 1–5 (2016). 43. Liaw, A. & Wiener, M. Classification and regression by random forest. R News 2, 18–22 (2002). 44. Wickham H. ggplot2: Elegant Graphics for Data Analysis (Springer-Verlang, New York. ISBN: 978-3-319-24277-4 (2016)). 45. Lopez, V. et al. An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Inf. Sci. 250, 113–141 (2013). 46. Saeys, Y., Inza, I. & Larrañaga, P. A review of feature selection techniques in bioinformatics. Bioinformatics 23, 2507–2517 (2007). 47. Bradley, A. A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 7, 1145–1159 (1997). Acknowledgements None. Author contributions Drs. J.C., S.R.-Y., and R.I.-R. conceived the scientific idea, designed the experiments, assisted with statistical analysis, and provided input in the writing of the manuscript. Drs. C.F.-L., V.M.-A., and S.S.-G. performed machine learning analysis, and provided input in the writing of the manuscript. Drs. M.R.-Y., and I.L.-D., had major role in the acquisition of data. Neuroimaging study, interpreted the data, revised the manuscript. A.E.-G., T.S., F.C., P.H. provided discussions on the project throughout, interpreted data and also provided input in the writing of the manuscript. All authors reviewed the manuscript. Scientific Reports | (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 11 Vol.:(0123456789) www.nature.com/scientificreports/ Funding This study was partially supported by grants from the Spanish Ministry of Science and Innovation (SAF201784267-R), Xunta de Galicia (Axencia Galega de Innovación (GAIN): IN607A2018/3), Instituto de Salud Carlos III (ISCIII) (PI17/00540, PI17/01103), Spanish Research Network on Cerebrovascular Diseases RETICS-INVICTUS PLUS (RD16/0019) and by the European Union FEDER program. T. Sobrino (CPII17/00027), F. Campos (CPII19/00020) are recipients of research contracts from the Miguel Servet Program (Instituto de Salud Carlos III). General Directorate of Culture, Education and University Management of Xunta de Galicia (ED431G/01,252 ED431D 2017/16), “Galician Network for Colorectal Cancer Research" (Ref. ED431D 2017/23), Competitive Reference Groups (ED431C 2018/49), Spanish Ministry of Economy and Competitiveness via funding of the unique installation BIOCAI (UNLC08-1E-002, UNLC13-13–3503), European Regional Development Funds (FEDER). Competing interests The authors declare no competing interests. Additional information Supplementary Information The online version contains supplementary material available at https://doi.org/ 10.1038/s41598-021-89434-7. Correspondence and requests for materials should be addressed to S.R.-Y. or R.I.-R. Reprints and permissions information is available at www.nature.com/reprints. Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. © The Author(s) 2021 Scientific Reports | Vol:.(1234567890) (2021) 11:10071 | https://doi.org/10.1038/s41598-021-89434-7 12