Fmed 10 1174631
Fmed 10 1174631
Fmed 10 1174631
REVIEWED BY
Chandra Segar T, Luka Beverin 1, Marko Topalovic 2, Armin Halilovic 2,
Vellore Institute of Technology (VIT), India Paul Desbordes 2, Wim Janssens 3 and Maarten De Vos 4,5*
Diana Calaras,
Nicolae Testemiţanu State University of 1
Statistics Research Centre, KU Leuven, Leuven, Belgium, 2 ArtiQ NV, Leuven, Belgium, 3 Laboratory of
Medicine and Pharmacy, Moldova Respiratory Diseases and Thoracic Surgery, Department of Chronic Diseases Metabolism and Ageing,
Christophe Delclaux, Ku Leuven, Leuven, Belgium, 4 Stadius, Department of Electrical Engineering, KU Leuven, Leuven,
Hôpital Robert Debré, France Belgium, 5 Department of Development and Regeneration, KU Leuven, Leuven, Belgium
*CORRESPONDENCE
Maarten De Vos
maarten.devos@kuleuven.be
RECEIVED 26 February 2023 Background and objective: Spirometry patterns can suggest that a patient has
ACCEPTED 13 April 2023 a restrictive ventilatory impairment; however, lung volume measurements such
PUBLISHED 19 May 2023
as total lung capacity (TLC) are required to confirm the diagnosis. The aim of
CITATION
the study was to train a supervised machine learning model that can accurately
Beverin L, Topalovic M, Halilovic A,
Desbordes P, Janssens W and De Vos M (2023) estimate TLC values from spirometry and subsequently identify which patients
Predicting total lung capacity from spirometry: would most benefit from undergoing a complete pulmonary function test.
a machine learning approach.
Front. Med. 10:1174631. Methods: We trained three tree-based machine learning models on 51,761
doi: 10.3389/fmed.2023.1174631 spirometry data points with corresponding TLC measurements. We then compared
COPYRIGHT model performance using an independent test set consisting of 1,402 patients.
© 2023 Beverin, Topalovic, Halilovic, The best-performing model was used to retrospectively identify restrictive
Desbordes, Janssens and De Vos. This is an
ventilatory impairment in the same test set. The algorithm was compared against
open-access article distributed under the terms
of the Creative Commons Attribution License different spirometry patterns commonly used to predict restriction.
(CC BY). The use, distribution or reproduction
Results: The prevalence of restrictive ventilatory impairment in the test set is
in other forums is permitted, provided the
original author(s) and the copyright owner(s) 16.7% (234/1402). CatBoost was the best-performing machine learning model.
are credited and that the original publication in It predicted TLC with a mean squared error (MSE) of 560.1 mL. The sensitivity,
this journal is cited, in accordance with
specificity, and F1-score of the optimal algorithm for predicting restrictive
accepted academic practice. No use,
distribution or reproduction is permitted which ventilatory impairment was 83, 92, and 75%, respectively.
does not comply with these terms.
Conclusion: A machine learning model trained on spirometry data can estimate
TLC to a high degree of accuracy. This approach could be used to develop future
smart home-based spirometry solutions, which could aid decision making and
self-monitoring in patients with restrictive lung diseases.
KEYWORDS
restriction, spirometry, machine learning, interstitial lung disease, total lung capacity
1. Introduction
Restrictive lung disorders are a group of conditions that affect the ability of the lungs to
expand fully, resulting in reduced lung capacity and difficulty breathing. These conditions are
typically caused by either intrinsic or extrinsic factors, such as interstitial lung diseases or
chestwall problems (1). Patients with restrictive lung disorders often experience a decreased
quality of life and increased morbidity, as the reduced lung capacity can make it difficult for them
to engage in physical activities and perform everyday tasks (2). While the true population
prevalence of restrictive diseases is unknown, it is estimated that the from ArtiQ consists of patient characteristics and spirometry
occurrence is 3–6 persons per 100,000 in the United States (3). measurements with a known TLC value. To detect anomalies,
The diagnostic criterion for restrictive lung disease is a total lung we implemented the Isolation Forest algorithm with 100 base
capacity (TLC) that falls below the lower limit of normal (LLN), which estimators (12). We removed all observations with an anomaly score
is defined as the fifth percentile of a healthy population. The at or above the 99th percentile. After pre-processing, we were left
measurement of TLC can be obtained using five different standardized with 51,761 unique observations where each observation represented
methods: whole-body plethysmography (WBP), helium dilution, a different patient.
nitrogen gas washout, chest radiographs, and computed tomography The independent test data set consists of 1,402 patients who
scanning (4, 5). However, these methods are not widely available in performed spirometry and WBP. This data set is formed by combining
primary care, require expert knowledge and are costly for routine use. two different cohorts that were studied in previous work:
As a result, primary care clinicians often rely on spirometry results to
identify potential cases of lung restriction and decide which patients 1. a prospective cohort study on first-time admissions in a
should undergo further pulmonary function testing. population-based sample (13), and
In recent years, the use of home-based spirometry to monitor lung 2. a retrospective cohort study of PFT data (14).
function in patients with interstitial lung disease (ILD) has gained
attention in clinical practice and research (6–8). Home-based More details on the studies can be found in the corresponding
spirometry has the potential to increase convenience and accessibility publications. Each subject had a validated clinical diagnosis based on
for patients with ILD, improve the frequency of data collection, and their medical history and complete PFT. Collected data for testing the
make it easier for patients to receive regular assessments of their lung models are from studies approved by the Ethics Committee of the
function. In addition, the integration of smartphone applications has University Hospital in Leuven. The combined cohort data set included
facilitated communication and collaboration between patients and patients diagnosed with obstructive (n = 885) and restrictive (n = 288)
healthcare providers. With advances in machine learning (ML) and lung disorders, as well as healthy individuals (n = 229). All patients
an increasing amount of health data available for analysis, it is included in the studies provided informed consent. A cohort
becoming more feasible to use ML algorithms to improve both the description is provided in Table 1.
quality and the interpretation of pulmonary function testing (9, 10).
Despite the potential benefits of using ML in home-based spirometry,
most research has focused on automating current human tasks (e.g., 2.2. Machine learning model training for
diagnosis). Besides, ML approaches have also the potential to estimate TLC prediction
non-standard values that have clinical impact, like TLC values.
The objective of this study was to train a supervised ML model to For the predicton of TLC, we trained three tree-based ML models -
predict TLC values using patient characteristics and data from Random Forest (RF) (15), Extreme Gradient Boosting (XGBoost) (16),
spirometry. The secondary objective was to investigate whether these and Categorical Boosting (CatBoost) (17). These algorithms are well
predictions could be used to accurately identify restrictive lung suited for tabular data sets, and are commonly used in industry,
impairment defined as TLC < LLN, where reference values are derived research, and competitions. The final feature set used for model training
from the 2012 global lung initiative (GLI) equations (11). We evaluate consisted of patient characteristics (e.g., age, height, gender, and weight)
the performance of our model on an independent dataset and compare and well-known spirometry measurements (e.g., FVC, FEV₁, FEV₁/
its ability to identify restrictive lung impairment to commonly used FVC, peak expiratory flow and forced expiratory flow at different
clinical guidelines (2005 ATS/ERS standards). Overall, our study percentages of FVC). For the XGBoost and RF models, one-hot
investigates the potential use of ML to aid in decision-making in office encoding was applied to the categorical feature (gender). These were all
and home-based spirometry by providing accurate and timely the features available for use in the model training.
predictions of TLC. Moreover, it allows to investigate in which patient Hyper-parameters of the models were fine-tuned through a
population such ML-based prediction might be most beneficial. randomized search (18) with 220 sampled hyper-parameters. To
develop the XGBoost model, a total of 43,200 possible combinations
were considered. For the CatBoost and RF models, 30,870 and 672
2. Methods possible combinations were considered, respectively. To find the optimal
combination of hyper-parameters, we applied k-Fold Cross-Validation
2.1. Data collection and preprocessing (k-fold CV) to the training data set (19). The value of k was set to 5
when performing k-fold CV because we found it to provide a good
In this study, we obtained data from two different sources: balance between computing time, bias, and variance. We then selected
ArtiQ1 and University Hospital Leuven. The data from ArtiQ is used the hyper-parameters that resulted in the lowest CV mean squared error
to train and tune the ML models, whereas the Hospital data is used (MSE). The modeling process is depicted in Figure 1. For all models,
as an independent test set to evaluate each model’s ability to predict we constructed hyper-parameter grid values that are in accordance with
TLC and subsequently identify restrictive lung impairment. Both existing literature and best practices from competitive data science
datasets contain only Caucasian patients. The training data collected platforms such as Kaggle.2
1 https://www.artiq.eu/ 2 https://www.kaggle.com/
Characteristics 1,402 subjects present in the 1,108 subjects with no 234 subjects with restrictive
cohort restrictive lung impairment lung impairment
Sex Male: 820 (58%) Male: 586 (50%) Male: 147 (63%)
BMI (kg/m ) 2
26.38 ± 5.31 26.11 ± 5.25 27.73 ± 5.58
FIGURE 1
Illustration of the machine learning-based algorithm for predicting total lung capacity. MSE, mean squared error.
for patients with restrictive disorders, including ILD and COPD (20 patients). Those subjects will have a small airway
thoracic deformity. obstructive syndrome or non-specific pattern, as previously
described (20, 21).
Table 4 details the performance indicators (sensitivity, specificity,
3.2. Identifying restrictive ventilatory PPV and F1-score) for the studied approaches. Our baseline algorithm
impairment achieved the same sensitivity (68%) as the 2005 ERS/ATS guidelines
for predicting restriction. However, our algorithm had higher
Confusion matrices for the different definitions are shown in specificity and attained a relatively good balance between sensitivity
Table 3. 16.7% (234/1402) of the 1,402 patients were detected as and PPV (F1-score of 74%). Moreover, lowering TLC estimations by
having restriction defined as TLC < LLN (5th percentile) subtracting α = 0.3 substantially increased the sensitivity of our
compared to 13.8% (194/1402) with our algorithm and 18.0% algorithm from 68 to 83%. The algorithm’s ability to effectively rule
(252/1402). Following the 2005 standards, 93 unnecessary full out restriction was then only moderately reduced (specificity 92%).
PFT would have been performed (PPV of 62%) versus only 35 The number of patients that would have missed necessary lung
with our method (PPV of 82%). Most of unnecessary tests would volume tests to confirm restriction when using the different definitions
be done in patients diagnosed with asthma (32 patients) and is shown in Table 5. Across all definitions, patients diagnosed with
ILD were the most susceptible to false negative results. Of the 165
TABLE 2 Confusion matrix for the prediction of restriction. patients with ILD, the 2005 ERS/ATS guideline definition missed
pulmonary restriction in 33 patients. Our baseline algorithm yielded
Prediction of Prediction of no a similar result; however, when α was adjusted to 0.3 the number of
restriction restriction
false negatives for ILD patients decreased almost threefold.
(TLCpredicted<LLN) (TLCpredicted≥LLN)
True restriction True positive False negative
(TLC < LLN)
4. Discussion
No true False positive True negative
restriction To the best of our knowledge, this is the first time that spirometry
(TLC ≥ LLN) data has been investigated to estimate TLC values using ML models
In this study, the ground truth for the prediction of a restriction is TLC < LLN. and large data sets. Given the type of data, our findings indicate that
FIGURE 2
The total lung capacity (TLC) predictions of the CatBoost model (TLCCatBoost) against the reference TLC measurements in the independent test set,
grouped by true restriction defined as TLC<lower limit of normal (LLN). The black dashed line represents the line of ideal agreement.
FIGURE 3
The prediction error for each diagnosis is calculated as the difference between the average total lung capacity (TLC) value and the average TLCCatBoost
prediction for that group. Bars above and below the horizontal dotted line indicate model underestimation and overestimation, respectively. COPD,
chronic obstructive pulmonary disease; ILD, interstitial lung disease; OBD, other obstructive disease; NMD, neuromuscular disease; PVD, pulmonary
vascular disease; TD, thoracic deformity.
TABLE 3 Confusion matrix for the prediction of restriction (a) based on tree-based algorithms in general are well suited for the prediction task
our machine learning model and (b) based on the 2005 standards at hand. After evaluating the models using MSE, we found that the
definition.
CatBoost model performed the best.
Prediction of Prediction of no Total For patients diagnosed with pulmonary vascular disease and
restriction restriction neuromuscular disease, the mean absolute difference between TLC
(TLCpredicted<LLN) (TLCpredicted≥LLN) values obtained by CatBoost and volume measurements was the
True 159 75 234 lowest with 392.2 and 324.1 mL, respectively. However, in patients with
restriction COPD our TLC prediction model largely underestimated true TLC
(TLC < LLN) values. This finding might be explained by a phenomenon called
pseudorestriction (22). Patients with severe obstruction may have air
No true 35 1,133 1,168
trapping with high residual volumes, thereby reducing FVC for a given
restriction
increased TLC (4). To date, 228 patients were identified with low FVC
(TLC ≥ LLN)
(LLN) despite normal TLC, of which 49.6% had the diagnosis of
Total 194 1,208 1,402
COPD. These subjects contributed most to the underestimation
we observe in the upper end of Figure 2.
Prediction of Prediction of Total
Considering the satisfactory performance of our TLC prediction
restriction no restriction
(FVC<LLN (FVC<LLN model, we examined its ability to serve as a tool for identifying
and FEV₁/ and FEV₁/ restrictive lung impairment. We incorporated a linear correction term
FVC≥LLN) FVC≥LLN) α to account for the model’s tendency to overestimate and
True restriction 159 75 234 underestimate in patients with and without restriction, respectively.
(TLC < LLN) By subtracting a small value of α to lower TLC predictions, the
No true 93 1,075 1,168
algorithm was able to achieve a remarkably high sensitivity without
restriction
negatively impacting specificity; thereby transforming spirometry into
(TLC ≥ LLN)
a high-value screening test. The tuning constant α that controls the
trade-off between sensitivity and specificity can be adjusted according
Total 252 1,150 1,402
to the context and priorities of different testing laboratories. For
References
1. Martinez-Pitre PJ, Sabbula BR, Cascella M. In StatPearls [Internet]. Treasure Island 14. Topalovic M, Laval S, Aerts JM, Troosters T, Decramer M, Janssens W, et al.
(FL): StatPearls Publishing (2023). Automated interpretation of pulmonary function tests in adults with respiratory
complaints. Respiration. (2017) 93:170–8. doi: 10.1159/000454956
2. Guerra S, Sherrill DL, Venker C, Ceccato CM, Halonen M, Martinez FD. Morbidity
and mortality associated with the restrictive spirometric pattern: a longitudinal study. 15. Breiman L. Random forests. Mach Learn. (2001) 45:5–32. doi: 10.1023/A:1010933404324
Thorax. (2010) 65:499–504. doi: 10.1136/thx.2009.126052
16. Chen T, Guestrin C. Xgboost: A scalable tree boosting system. In Proceedings of
3. Raj R, Raparia K, Lynch DA, Brown KK. Surgical lung biopsy for interstitial lung the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data
diseases. Chest. (2017) 151:1131–40. doi: 10.1016/j.chest.2016.06.019 Mining (2016) (pp. 785–794).
4. Wanger J, Clausen JL, Coates A, Pedersen OF, Brusasco V, Burgos F, et al. 17. Prokhorenkova L, Gusev G, Vorobev A, Dorogush AV, Gulin A. Cat boost:
Standardisation of the measurement of lung volumes. Eur Respir J. (2005) 26:511–22. unbiased boosting with categorical features. Adv Neural Inf Proces Syst. (2018) 31,
doi: 10.1183/09031936.05.00035005 6639–6649.
5. Pellegrino R, Viegi G, Brusasco V, Crapo RO, Burgos F, Casaburi R, et al.
18. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach
Interpretative strategies for lung function tests. Eur Respir J. (2005) 26:948–68. doi:
10.1183/09031936.05.00035205 Learn Res. (2012) 13, 281–305.
6. Maher TM, Schiffman C, Kreuter M, Moor CC, Nathan SD, Axmann J, et al. A 19. Arlot S, Celisse A. A survey of cross-validation procedures for model selection.
review of the challenges, learnings and future directions of home handheld spirometry Stat Surv. (2010) 4:40–79.
in interstitial lung disease. Respir Res. (2022) 23:307. doi: 10.1186/s12931-022-02221-4 20. Hyatt RE, Cowl CT, Bjoraker JA, Scanlon PD. Conditions associated with an
7. Moor CC, van den Berg CAL, Visser LS, Aerts JGJV, Cottin V, Wijsenbeek MS. abnormal nonspecific pattern of pulmonary function tests. Chest. (2009)
Diurnal variation in forced vital capacity in patients with fibrotic interstitial lung disease 135:419–24. doi: 10.1378/chest.08-1235
using home spirometry. ERJ Open Res. (2020) 6:00054–2020. doi:
21. Chevalier-Bidaud B, Gillet-Juvin K, Callens E, Chenu R, Graba S, Essalhi M, et al.
10.1183/23120541.00054-2020
Non-specific pattern of lung function in a respiratory physiology unit: causes and
8. Nakshbandi G, Moor CC, Wijsenbeek MS. Home monitoring for patients with ILD prevalence: results of an observational cross-sectional and longitudinal study. BMC
and the COVID-19 pandemic. Lancet Respir Med. (2020) 8:1172–4. doi: 10.1016/ Pulm Med. (2014) 14:148. doi: 10.1186/1471-2466-14-148
S2213-2600(20)30452-5
22. Al-Ashkar F, Mehra R, Mazzone PJ. Interpreting pulmonary function tests:
9. Giri PC, Chowdhury AM, Bedoya A, Chen H, Lee HS, Lee P, et al. Application of recognize the pattern, and the diagnosis will follow. Cleve Clin J Med. (2003) 70: 866,
machine learning in pulmonary function assessment where are we now and where are 868, 871–873, passim. doi: 10.3949/ccjm.70.10.866
we going? Front Physiol. (2021) 12:678540. doi: 10.3389/fphys.2021.678540
23. Cosgrove GP, Bianchi P, Danese S, Lederer DJ. Barriers to timely diagnosis of
10. Gonem S, Janssens W, Das N, Topalovic M. Applications of artificial intelligence interstitial lung disease in the real world: the INTENSITY survey. BMC Pulm Med.
and machine learning in respiratory medicine. Thorax. (2020) 75:695–701. doi: 10.1136/ (2018) 18:9. doi: 10.1186/s12890-017-0560-x
thoraxjnl-2020-214556
24. Pritchard D, Adegunsoye A, Lafond E, Pugashetti JV, DiGeronimo R, Boctor N,
11. Quanjer PH, Stanojevic S, Cole TJ, Baur X, Hall GL, Culver BH, et al. Multi-ethnic et al. Diagnostic test interpretation and referral delay in patients with interstitial lung
reference values for spirometry for the 3-95-yr age range: the global lung function 2012 disease. Respir Res. (2019) 20:253. doi: 10.1186/s12931-019-1228-2
equations. Eur Respir J. (2012) 40:1324–43. doi: 10.1183/09031936.00080312
25. Tang Y, Zhang M, Feng Y, Liang B. The measurement of lung volumes
12. Liu FT, Ting KM, Zhou ZH. Isolation forest. In: Proceedings of the 8th IEEE International using body plethysmography and helium dilution methods in COPD
Conference on Data Mining, Pisa, Italy. Los Alamitos: IEEE (2008). p. 413–422. patients: a correlation and diagnosis analysis. Sci Rep. (2016) 6:37550. doi: 10.1038/
srep37550
13. Decramer M, Janssens W, Derom E, Joos G, Ninane V, Deman R, et al.
Contribution of four common pulmonary function tests to diagnosis of patients with 26. O’Donnell CR, Bankier AA, Stiebellehner L, Reilly JJ, Brown R, Loring SH.
respiratory symptoms: a prospective cohort study. Lancet Respir Med. (2013) 1:705–13. Comparison of plethysmographic and helium dilution lung volumes: which is best for
doi: 10.1016/S2213-2600(13)70184-X COPD? Chest. (2010) 137:1108–15. doi: 10.1378/chest.09-1504