Machine learning for the prediction of preclinical airway management in injured patients: a registry-based trial

Article information

Clin Exp Emerg Med. 2022;9(4):304-313
Publication date (electronic) : 2022 November 23
doi :
1Department of Anesthesiology, Intensive Care and Emergency Medicine, Ludwigshafen Municipal Hospital, Ludwigshafen, Germany
2Center for Quality Management in Emergency Medical Services Baden-Wuerttemberg (SQR-BW), Stuttgart, Germany
3Department of Anesthesiology and Intensive Care Medicine, University Medical Center Mannheim, Mannheim, Germany
4Clinic for Anesthesia, Intensive Care and Pain Therapy, BG Trauma Center Tuebingen, Tuebingen, Germany
Correspondence to: André Luckscheiter Department of Anesthesiology, Intensive Care and Emergency Medicine, Ludwigshafen Municipal Hospital, Bremserstrasse 79, Ludwigshafen 67063, Germany E-mail:
Received 2022 June 16; Revised 2022 September 11; Accepted 2022 October 16.



The aim of this study was to determine the feasibility of using machine learning to establish the need for preclinical airway management for injured patients based on a standardized emergency dataset.


A registry-based, retrospective analysis was conducted of adult trauma patients who were treated by physician-staffed emergency medical services in southwestern Germany between 2018 and 2020. The primary outcome was to assess the feasibility of using the random forest (RF) and Naive Bayes (NB) machine learning algorithms to predict the need for preclinical airway management. The secondary outcome was to use a principal component analysis to determine the attributes that can be used and advanced for future model development.


In total, 25,556 adults with multiple injuries were identified, including 1,451 patients (5.7%) who required airway management. Key attributes were auscultation, injury pattern, oxygen therapy, thoracic drainage, noninvasive ventilation, catecholamines, pelvic sling, colloid infusion, initial vital signs, preemergency status, and shock index. The area under the receiver operating characteristics curve was between 0.96 (RF; 95% confidence interval [CI], 0.96–0.97) and 0.93 (NB; 95% CI, 0.92–0.93; P<0.01). For the prediction of airway management, RF yielded a higher precision-recall area than NB (0.83 [95% CI, 0.8–0.85] vs. 0.66 [95% CI, 0.61–0.72], respectively; P<0.01).


To predict the need for preclinical airway management in injured patients, attributes that are commonly recorded in standardized datasets can be used with machine learning. In future models, the RF algorithm could be used because it has robust prediction accuracy.


International guidelines recommend preclinical airway management as a potential life-saving procedure for severely injured patients with traumatic brain injury and a Glasgow Coma Scale (GCS) <9; severe respiratory insufficiency, for example, due to thoracic trauma or airway injuries; or trauma-associated shock [1-3]. However, preclinical airway management is a high-risk procedure due to imminent hypoxia, challenging environmental conditions, and varying clinician experience in managing difficult airway situations [4,5]. Because hemodynamic conditions and the patient’s state of awareness can change quickly, preclinical trauma care is a highly dynamic situation. Therefore, an ability to predict or exclude the need for airway management would assist decision-making.

In recent years, several machine learning models that can predict the need for endotracheal intubation in intensive care patients have been published. They are based on electronic medical record systems and common clinical hemodynamic and laboratory parameters [6-9]. In preclinical trauma medicine, no such model exists.

German emergency medical services are divided into paramedic and emergency physician systems (grounded or air), which are alarmed by the rescue coordination center in parallel or sequentially depending on the emergency. Certain medical interventions, such as drug therapy or airway management, are restricted by law to emergency physicians except when needed for resuscitation or when an emergency physician is unavailable. German emergency physicians recruit themselves mainly from fields such as anesthesiology, internal medicine, and surgery. The specialization can be achieved in parallel with main medical specialist training after two years of clinical practice, which must contain at least a 6-month rotation in the accident and emergency department or intensive care unit [5,10]. For quality improvement, the German state of Baden-Wuerttemberg (population, 11.1 million in 2020; area, 35,751 km2; capital, Stuttgart) created a Center for Quality Management in Emergency Medical Services in 2011. Since then, all paramedics and preclinical emergency physicians have had to provide anonymous, digital documentation to the minimal emergency dataset (MIND) [10,11]. The MIND has the advantage of being used throughout Germany, and it also contains international standardized examination findings, diagnoses, and interventions that are used in the German Trauma Registry and the German Resuscitation Registry. Divided into subcategories according to the Advanced Trauma Life Support (ABCDE) algorithm at first contact and hospital admission and supplemented by a free text anamnesis and history (including vital signs diagram) of pharmaceutical therapy and medical interventions, the MIND provides nationwide, standardized, emergency documentation. Although the free text and history sections are not available digitally, the MIND seems suitable for research with machine learning.

Therefore, the aim of this study was to evaluate the feasibility of building machine learning models to predict the need for preclinical airway management in trauma patients. As a first step, attributes of the MIND that define patients who need preclinical airway management were identified. Second, two machine learning algorithms were tested to demonstrate the accuracy of the models.


Ethical statements

This study is reported based on the TRIPOD (Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis) statement [12]. The trial was approved by the Ethics Committee of the State Medical Association of Rhineland-Palatinate (No. 2021-15767-retrospektiv). The study is a retrospective registry analysis with anonymized data. Informed consent was waived due to the retrospective nature of the study.

Design and setting

Adult patients with multiple injuries who were primarily treated by a physician-staffed ground or air ambulance from 2018 to 2020 were selected from the MIND. Dead patients and those requiring resuscitation were excluded. Briefly, the MIND files of the remaining patients were preprocessed for attribute selection using medical causality and a principal component analysis (PCA). With the help of the resulting attributes, Naive Bayes (NB) and random forest (RF) models were trained and tested to find their accuracy in predicting whether those injured patients were given preclinical airway management. Patient selection, dataset creation, and the analyses are illustrated in Fig. 1.

Fig. 1.

Flowchart for patient selection, dataset creation, and analysis. SQR-BW, Center for Quality Management in Emergency Medical Services Baden-Wuerttemberg; MIND, minimal emergency dataset; SMOTE, synthetic minority oversampling method. a)A total of 24 attributes included: >550 attributes filtered by causality or potential correlation, then selected by principal component analysis (Wrapper).


The MIND does not yet contain anesthesia as an attribute. Therefore, emergency general anesthesia in any injured patient was defined as documentation of invasive airway management, positive end tidal CO2 without noninvasive ventilation (NIV) at admission, documented invasive ventilation at admission and the use of a muscle relaxant, or any use of a muscle relaxant. The main assumption was the correct indication of preclinical emergency anesthesia.

Attribute selection and data preprocessing

The MIND includes more than 550 anonymized attributes, including specialization of the physician, standardized clinical examination findings, medical diagnoses, injury patterns in relation to particular body parts (classified as none, mild, moderate, severe, or deadly by the attending physicians), blunt or penetrating trauma, and vital signs at first contact and hospital admission, including the GCS, heart rate, systolic blood pressure, respiratory rate, oxygen saturation, end tidal CO2, temperature, blood glucose level, and pain level. Furthermore, electrocardiogram findings (at first contact and hospital admission), medication (without dosage or timing), treatment (NIV, invasive airway management, thoracic drainage, pelvic sling), infusion therapy (crystalloid/colloid infusion, blood products), age, preemergency status (PES; a preclinically adapted classification of the American Society of Anesthesiologists), time on site, and transport time are recorded in the dataset [13].

The datasets of patients with cardiac arrest were excluded because abstaining from resuscitation could bias the weighting of certain attributes. Only datasets with at least two of the following three attributes, initial GCS, systolic blood pressure, and oxygen saturation, were included because those parameters represent the guidelines’ recommendations [1-3].

In data preprocessing, generally accepted attributes in the training set with potential correlations but no medical causality were excluded from the machine learning analysis (e.g., place of accident), as were causal attributes without any frequent occurrence in one of the two classes. Attributes correlating with indications for airway management were identified using international guidelines about respiratory, neurological, or hemodynamic findings and injury patterns [2,3,14]. However, because critical volume loss and (developing) shock are not directly recorded in the MIND, surrogate parameters such as pelvic sling or tranexamic acid were also included.

The imputation of missing data was not considered due to the nominal character of most attributes. Because the remaining attributes all contributed with different weightings, a PCA was performed on the whole dataset using the wrapper method with a bidirectional search and a C4 decision tree (J48) with tenfold crossvalidation (settings in Supplementary Table 2) [15]. The Java-based software Weka ver. 3.8.4 (University of Waikato, Hamilton, New Zealand) was used for the PCA and machine learning [16,17]. Statistical comparison of the attributes between the two classes (airway management and no airway management) was performed with chi-square test, U-test, or t-test, as appropriate, in Microsoft Excel (Microsoft Corp., Redmond, WA, USA). A P-value of less than 0.05 was defined as significant. Continuous variables are expressed as means and standard deviations, and categorical variables are expressed as percentages.

Class balancing, training, and testing

The data were split into a 60% training set and 40% test set 10 times with a randomized split procedure to define the performance of the algorithms with different frequencies of invasively ventilated patients. In general, machine learning algorithms tend to learn and predict the majority class, whereas most studies are interested in the minority class. To handle that class imbalance problem for the minority class that received airway management, the synthetic minority oversampling method (SMOTE) algorithm was used to triple the airway management class in the training sets, but not in the test sets. SMOTE synthesis creates one new minority instance out of k=5 existing minority instances using the k-nearest neighbor approach (Supplementary Table 3) [18]. This procedure was chosen because Weka does not offer a cross-validation that uses SMOTE in training but not in testing. Tripling the minority class was an appropriate assessment to improve the predictions and prevent overfitting. For supervised machine learning, the NB and RF methods were chosen (Supplementary Table 4). Both algorithms can handle missing values.

Model performance

All results are presented as means with 95% confidence intervals (CIs). As performance criteria, overall correctness, kappa value, the area under the receiver operator curve (AUC-ROC), sensitivity (need for airway management), specificity (no need for airway management), positive predictive value (PPV) and negative predictive value (NPV), and the precision-recall (PRC) area were chosen [15]. The Matthews correlation coefficient (MCC) was used to measure the quality of the two presented classes of very different sizes (range: –1, total disagreement; 0, random prediction; +1, perfect prediction) [19]. The cost-benefit calculation for the RF algorithm was performed automatically for the lowest overall error rate. The performance across all 10 test sets was averaged and compared with a t-test (P<0.05 as significant, calculated in Microsoft Excel).


Out of more than 130,000 injured patients, 26,765 patients with multiple injuries were selected. Of the selections, 869 resuscitations, 6 fatal cases, and 335 insufficiently documented datasets were then excluded, leaving 25,556 datasets with 1,451 cases (5.67%) of airway management.

Data preprocessing identified 31 attributes with potential correlation or medical causality. In the PCA, 24 attributes were selected, among them auscultation, injury pattern without the upper limbs or soft parts, oxygen therapy, NIV, tranexamic acid and catecholamines, pelvic sling, vital signs, PES, and shock index. With the exception of initial systolic blood pressure and respiratory rate (P>0.05), the groups with and without airway management differed significantly (Table 1). For further information about nonselected attributes see Supplementary Table 1.

Clinical findings and medical treatments for both classes with the attributes selected through the principal component analysis

In overall correctness, the RF outperformed the NB (97.8 [95% CI, 97.57–98.03] vs. 93.55 [95% CI, 93.11–93.99], respectively; P<0.01). The RF reached a significantly higher kappa value (0.78 [95% CI, 0.75–0.8]) than the NB (0.54 [95% CI, 0.52–0.56]; P<0.01). In the AUC-ROC analysis, the RF reached 0.96 (95% CI, 0.96–0.97), and the NB reached 0.93 (95% CI, 0.92–0.93; P<0.01) (Fig. 2A). Furthermore, the RF model had a significantly higher MCC than the NB approach (0.78 [95% CI, 0.76–0.8] vs. 0.56 [95% CI, 0.54–0.57], respectively; P<0.01).

Fig. 2.

Averaged (A) receiver operator curves for the overall performance and (B) precision-recall curves for the prediction of airway management by the Naive Bayes and random forest algorithms. AUC, area under the curve; CI, confidence interval.

In predicting the use of airway management, the difference between the NB and RF results was not statistically significant (0.75 [95% CI, 0.73–0.76] vs. 0.73 [95% CI, 0.71–0.76], respectively; P=0.38). The best PPV was gained with the RF (0.85 [95% CI, 0.84–0.87]; NB, 0.46 [95% CI, 0.44–0.49]; P<0.01). This also resulted in a larger PRC area for the RF (0.83 [95% CI, 0.80–0.85]; NB, 0.66 [95% CI, 0.61–0.72]; P<0.01) (Fig. 2B).

Both algorithms yielded a very high specificity (RF, 0.993 [95% CI, 0.992–0.994] vs. NB, 0.947 [95% CI, 0.942–0.952]; P<0.01), a high NPV (RF, 0.984 [95% CI, 0.980–0.987] vs. NB, 0.984 [95% CI, 0.983–0.985]; P=0.85), and a high PRC area (RF, 0.996 [95% CI, 0.996–0.997] vs. NB, 0.992 [95% CI, 0.992–0.993]; P<0.01) (Table 2).

Model performance and evaluation of random forest versus Naive Bayes

The average threshold of the RF model was 0.51 (95% CI, 0.49–0.53). Due to the decision process used by the NB, no average threshold can be given for it. The three most important attributes in the RF were systolic blood pressure (0.306±0.019), head injury (0.305±0.013), and initial heart rate (0.294±0.018) (Fig. 3).

Fig. 3.

Attribute weighting in the random forest model, given as means with standard deviation error bars.


This study set out to develop a decision model for determining the necessity of preclinical airway management in adult trauma patients. Commonly recorded preclinical attributes such as injury pattern, certain examination findings, vital signs, and emergency medical interventions were found to be most influential in forecasting the need for preclinical airway management. Both models developed here showed excellent results in excluding the need for airway management, but only the RF model had satisfactory accuracy in predicting it. Therefore, the feasibility of using machine learning to predict the need for airway management in preclinical trauma patients has been confirmed, but the models need to be advanced. Nonetheless, even before a final model can be implemented in the electronic medical records, the attributes determined here can already be used clinically to alert emergency physicians about trauma patients at increased risk of requiring airway management. For example, the absence of severe head or thoracic injury, catecholamine therapy, thoracic drainage, or NIV could justify a later evaluation of airway protection. To the best of our knowledge, this analysis is the first to use machine learning to forecast airway management in a preclinical environment. However, several factors need to be considered to interpret and advance the results.

Database, attribute selection, and model comparison

The more distinct the pathological findings in the initial parameters, the better the classification by the algorithms could be. However, differences in attributes such as GCS or oxygen saturation were marginal, and their averages were physiological, which was partly reported in other clinical modeling studies [8,20,21]. This could be explained by belated documentation of paramedically stabilized vital signs.

Attribute choice is always a compromise between overgeneralization (selecting only attributes with strong correlation or causality) and overfitting (selecting many attributes, even those with weak correlation). The PCA in this study filtered in attributes with strong indirect correlations with airway management. For example, the use of catecholamines can be interpreted as a surrogate for hemodynamic instability before or after airway management in emergency anesthesia. Other surrogates were colloid infusion, pelvic sling, and tranexamic acid for potential blood loss (attribute tourniquet not included in MIND). NIV can be discussed as a surrogate for respiratory failure or a method of preoxygenation. Although the shock index is only to some extent reliable for the diagnosis of shock, it had weight in combination with other attributes [22,23]. Because preclinical emergency physicians in Germany usually lack point-of-care and radiographic findings, they have to use a less-reliable clinical examination with baseline vital signs for their time-critical decision-making. The surrogate parameters used in this study can therefore be seen as a replacement for real-time vital signs. They also reflect to some extent the recommendations for airway management in patients with traumatic respiratory disorder, brain injury, and shock [1-3]. Future prediction models in preclinical airway management should combine attributes emphasized in the guidelines with selected surrogates that reflect the dynamics of preclinical emergency medicine to compensate for any lack of real-time parameters.

Compared with other studies, a main distinction of this study is the restriction to initial vital signs and adaptation to preclinical conditions [2,3]. Siu et al. [20] used an additional blood gas analysis with sequential organ failure assessments at multiple time points for their RF model to predict the need for intubation in the first 24 hours after a critical care admission (sensitivity, 0.88; specificity, 0.66; AUC-ROC, 0.86; PPV, 0.73; NPV, 0.85). Arvind et al. [6] indicated a AUC-ROC of 0.84 and PRC area of 0.3 for their RF model for predicting mechanical ventilation in COVID-19 patients based on vital signs and a blood gas analysis. In neonatal intensive care, Clark et al. [8] demonstrated a boosted logistic regression model with an AUC-ROC of 0.84. Politano et al. [21] could predict urgent intubation in a trauma intensive care unit with an AUC-ROC of 0.770 to 0.865 with the help of a boosted logistic regression using multiple sampling windows for vital signs along with age, oxygen partial pressure, and days since extubation.

Model performance

With regard to the performance of both algorithms, several factors about their basic method of calculation and the prevalence of airway management must be considered. In this study, the ROC curve alone overestimates the model performance because of the class imbalance problem (94% without emergency anesthesia) and the very high specificities and negative predictive values. Therefore, the goodness of class prediction can best be evaluated by the PRC area, which showed that the RF had a robust predication accuracy [24].

The basic assumption of the NB is the independence of all attributes without any correlation. Such a level of independence is almost never found in real-world data. In this study, the auscultation findings, respiratory rate, and oxygen saturation all influence one another, as do the GCS score and face and/or head injury. The decision process in favor of or against a class is performed by comparing the summed probability of the test case to the summed probability of the class, which leads to the shown bad calibration. The advantage of an NB approach is its fast calculation and simple implementation. Also, the arithmetic means and variance are parameterized independently of all other variables [15].

Unlike in the NB, independence is not a basic assumption of an RF. Decision trees have the advantage of using the same attributes on different levels in different dependencies. In contrast to a single decision tree model, an RF uses the bagging procedure, by which multiple random trees each calculate a prediction. Those are then averaged to reach a final decision. This explains not only why RF got better outcomes than NB but also the weights of certain attributes whose differences were marginal. Those same effects also appear in the PCA, because it also uses a decision tree model. Therefore, the RF is robust to outliers, works well with nonlinear data, and has a lower risk of overfitting than single decision trees. As a result, the RF could handle even the relatively small prevalence of airway management cases in the test sets, achieved a good PRC area, and had a robust performance [15,25]. Given the prevalence between the different test sets, the RFs differ, and a final model cannot be given.

Further limitations

Due to the former and following limitations, this study represents only a first attempt to build a sustainable, general model for predicting preclinical airway management. Overreliance on machine learning in high-risk situations can result in potential patient hazards. Future models are also needed for internal and neurological patients. These results were developed in a physician-staffed emergency medical system and therefore cannot be simply transferred to paramedic systems [26]. The weighting of certain attributes could be changed by alterations in clinical practice. The timing of interventions is missing from MIND, which limits the applicability of the models presented here. Unlike previous prediction models for resuscitation, attributes such as trauma site were not included in the data used here. Whereas in resuscitation, the site of cardiac arrest is directly linked to bystander cardio-pulmonary resuscitation, there is no such correlation for trauma site or mechanism and airway management, only for trauma severity [3,27]. Unfortunately, that severity can only be assessed by the primary physical exam and not by later radiographic findings and hospital data. Although this study used data from a statewide emergency medical service, no independent external test set from another German region was used here. Therefore, predications of stability with regard to noise and overfitting must be restrained. Unlike in other studies, the imputation of missing values in this study was not reasonable, mainly due to static nominal, binary, or ordinal attributes [6,20]. Whether emergency physicians postponed endotracheal intubation because of a potentially difficult airway or a lack of experience cannot be stated because no further clinical records were available [5]. Also, the correct indication for airway management and primary assessment according to the ABCDE algorithm could not be checked in every single case due to the retrospective design and dataset structure. In machine learning, unsupervised deep learning neural networks have recently outperformed supervised approaches such as the RF. However, those deep learning models require a large amount of data and computing power. Network creation is complex, unstandardized, and time-consuming. Because this study focused on a simple binary problem, and the data structure was inconsistent, RF and NB were chosen. The supplementary data contain a first approach to a deep learning neural network, but it performed worse than the RF in predicting the need for airway management (Supplementary Table 5 and Supplementary fig. 1). Nonetheless, a deep learning application might be suitable for future models, especially with real-time attributes [15].


In conclusion, this study has shown the feasibility of using a machine learning model to predict the need for airway management in injured patients. The RF model combined a satisfactory prediction performance with an excellent ability to exclude the need for airway management in trauma patients. Because the many attributes available can be a hindrance in quickly assessing trauma patients, models such as those presented here could already be used as surveillance tools in the background or to send the intubation probability to the hospital, where additional resources could be activated. Embedded in a continuous electronic medical record and expanded by data about internal patients, real-time parameters and point-of-care tests, an RF-based prediction model could be made more reliable and support preclinical decision-making or quality management. In the future, patients at risk could be identified at an early time with the help of such a machine learning model.


Supplementary materials are available at Further supplementary data, including single random forest models, are available upon reasonable request via e-mail. Due to data protection, the datasets cannot be published, but research with the database is possible upon request to the Center for Quality Management in Emergency Medical Services Baden-Wuerttemberg (SQR-BW).

Supplementary Table 1.

All recorded attributes and their values together with the class comparison and reason for exclusion


Supplementary Table 2.

Settings of the principal component analysis in Weka


Supplementary Table 3.

Settings of the SMOTE algorithm in Weka


Supplementary Table 4.

Settings of the random forest and Naive Bayes model in Weka


Supplementary Table 5.

Performance of two deep learning networks before and after attribute selection


Supplementary Fig. 1.

Averaged receiver operator curves (ROC) for (A) the overall performance and (B) the averaged precision-recall (PRC) curves for the prediction of airway management of the Naive Bayes, the random forest algorithm, and the deep learning neural network (one dense layer with six neurons) after attribute selection.




No potential conflict of interest relevant to this article was reported.


This work was supported by the Department of Anesthesiology, Operative Intensive Care Medicine and Emergency Medicine, Ludwigshafen Municipal Hospital.


Conceptualization: AL; Data curation: AL, TL, JE; Formal analysis: AL; Funding acquisition: WZ; Investigation: AL; Methodology: AL; Project administration: TV; Resources: TL, JE; Software: AL; Supervision: WZ, MT, TV; Validation: AL; Visualization: AL; Writing–original draft: AL; Writing–review & editing: WZ, TL, JE, MT, TV.

All authors read and approved the final manuscript.


1. Crewdson K, Rehn M, Lockey D. Airway management in prehospital critical care: a review of the evidence for a ‘top five’ research priority. Scand J Trauma Resusc Emerg Med 2018;26:89.
2. Rehn M, Hyldmo PK, Magnusson V, et al. Scandinavian SSAI clinical practice guideline on pre-hospital airway management. Acta Anaesthesiol Scand 2016;60:852–64.
3. Polytrauma Guideline Update Group. Level 3 guideline on the treatment of patients with severe/multiple injuries: AWMF Register-Nr. 012/019. Eur J Trauma Emerg Surg 2018;44(Suppl 1):3–271.
4. Shavit I, Levit B, Basat NB, Lait D, Somri M, Gaitini L. Establishing a definitive airway in the trauma patient by novice intubators: a randomised crossover simulation study. Injury 2015;46:2108–12.
5. Luckscheiter A, Lohs T, Fischer M, Zink W. Airway management in preclinical emergency anesthesia with respect to specialty and education. Anaesthesist 2020;69:170–82.
6. Arvind V, Kim JS, Cho BH, Geng E, Cho SK. Development of a machine learning algorithm to predict intubation among hospitalized patients with COVID-19. J Crit Care 2021;62:25–30.
7. Bolourani S, Brenner M, Wang P, et al. A machine learning prediction model of respiratory failure within 48 hours of patient admission for COVID-19: model development and validation. J Med Internet Res 2021;23:e24246.
8. Clark MT, Vergales BD, Paget-Brown AO, et al. Predictive monitoring for respiratory decompensation leading to urgent unplanned intubation in the neonatal intensive care unit. Pediatr Res 2013;73:104–10.
9. Rahimian F, Salimi-Khorshidi G, Payberah AH, et al. Predicting the risk of emergency admission with machine learning: development and validation using linked electronic health records. PLoS Med 2018;15:e1002695.
10. Luckscheiter A, Lohs T, Fischer M, Zink W. Preclinical emergency anesthesia : a current state analysis from 2015-2017. Anaesthesist 2019;68:270–81.
11. Messelken M, Schlechtriemen T, Arntz HR, et al. Minimal data set in German Emergency Medicine MIND3. Notf Rett Med 2011;14:647–54.
12. Collins GS, Reitsma JB, Altman DG, Moons KG. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): the TRIPOD statement. BMJ 2015;350:g7594.
13. Abouleish AE, Leib ML, Cohen NH. ASA provides examples to each ASA physical status class. ASA Newsl 2015;79:38–49.
14. Timmermann A, Bottiger BW, Byhahn C, et al. German guideline for prehospital airway management (short version). Anasthesiol Intensivmed 2019;6:316–36.
15. Witten IH, Frank E, Hall MA, Pal CJ. Data mining: practical machine learning tools and techniques 4th edth ed. Cambridge, MA: Morgan Kaufmann; 2017.
16. Langer T, Favarato M, Giudici R, et al. Development of machine learning models to predict RT-PCR results for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) in patients with influenza-like symptoms using only basic clinical data. Scand J Trauma Resusc Emerg Med 2020;28:113.
17. Yadav K, Sarioglu E, Choi HA, Cartwright WB 4th, Hinds PS, Chamberlain JM. Automated outcome classification of computed tomography imaging reports for pediatric traumatic brain injury. Acad Emerg Med 2016;23:171–8.
18. Chawla NV, Bowyer KW, Hall LO, Kegelmeyer WP. SMOTE: synthetic minority over-sampling technique. J Artif Intell Res 2002;16:321–57.
19. Chicco D, Totsch N, Jurman G. The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation. BioData Min 2021;14:13.
20. Siu BM, Kwak GH, Ling L, Hui P. Predicting the need for intubation in the first 24 h after critical care admission using machine learning approaches. Sci Rep 2020;10:20931.
21. Politano AD, Riccio LM, Lake DE, et al. Predicting the need for urgent intubation in a surgical/trauma intensive care unit. Surgery 2013;154:1110–6.
22. Olaussen A, Blackburn T, Mitra B, Fitzgerald M. Review article: shock index for prediction of critical bleeding post-trauma: a systematic review. Emerg Med Australas 2014;26:223–8.
23. Tran A, Yates J, Lau A, Lampron J, Matar M. Permissive hypotension versus conventional resuscitation strategies in adult trauma patients with hemorrhagic shock: a systematic review and meta-analysis of randomized controlled trials. J Trauma Acute Care Surg 2018;84:802–8.
24. Saito T, Rehmsmeier M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS One 2015;10:e0118432.
25. Basu S, Faghmous JH, Doupe P. Machine learning methods for precision medicine research designed to reduce health disparities: a structured tutorial. Ethn Dis 2020;30(Suppl 1):217–28.
26. Futoma J, Simons M, Panch T, Doshi-Velez F, Celi LA. The myth of generalisability in clinical research and machine learning in health care. Lancet Digit Health 2020;2:e489–92.
27. Grasner JT, Meybohm P, Lefering R, et al. ROSC after cardiac arrest: the RACA score to predict outcome after out-of-hospital cardiac arrest. Eur Heart J 2011;32:1649–56.

Article information Continued


Capsule Summary

What is already known

Preclinical airway management is a high risk procedure. Other than a Glascow Coma Scale of less than 9 or acute respiratory insufficiency, there are few methods to predict the need for preclinical airway management.

What is new in the current study

We developed and validated a machine learning model to predict the need for airway management in injured patients.

Fig. 1.

Flowchart for patient selection, dataset creation, and analysis. SQR-BW, Center for Quality Management in Emergency Medical Services Baden-Wuerttemberg; MIND, minimal emergency dataset; SMOTE, synthetic minority oversampling method. a)A total of 24 attributes included: >550 attributes filtered by causality or potential correlation, then selected by principal component analysis (Wrapper).

Fig. 2.

Averaged (A) receiver operator curves for the overall performance and (B) precision-recall curves for the prediction of airway management by the Naive Bayes and random forest algorithms. AUC, area under the curve; CI, confidence interval.

Fig. 3.

Attribute weighting in the random forest model, given as means with standard deviation error bars.

Table 1.

Clinical findings and medical treatments for both classes with the attributes selected through the principal component analysis

Attribute Airway management
Yes (n=1,451) No (n=24,105)
Auscultation 1 < 0.01a)
 Obstruction/gasping/apnea 15.0 0.3
 Bronchial spasm 18.0 0.3
 Rhonchi 2.0 0.2
 Other 31.0 13.0
Auscultation 2
 Dyspnea ± cyanosis 37.0 3.0 < 0.01a)
Head injury
 None 36.0 65.0 < 0.01a)
 Mild 5.0 20.0 < 0.01a)
 Moderate 15.0 13.0 0.05a)
 Severe 44.0 2.0 < 0.01a)
Face injury < 0.01a)
 None 77.0 83.0
 Mild 5.0 10.0
 Moderate 11.0 7.0
 Severe 7.0 0.6
Cervical spine injury
 None 90.0 88.0 0.07
 Mild 2.0 6.0 < 0.01a)
 Moderate 4.0 5.0 0.40
 Severe 4.0 0.7 < 0.01
Thoracic/lumbar spine injury < 0.01a)
 None 90.0 85.0
 Mild 1.0 5.5
 Moderate 4.0 8.0
 Severe 4.0 1.0
Thoracic injury
 None 68.0 77.0 < 0.01a)
 Mild 3.0 9.0 < 0.01a)
 Moderate 10.0 12.0 0.10
 Severe 19.0 2.0 < 0.01a)
Abdominal injury
 None 85.0 92.0 < 0.01a)
 Mild 1.0 2.0 < 0.01a)
 Moderate 4.0 4.0 0.50
 Severe 10.0 1.0 < 0.01a)
Pelvic injury
 None 83.0 87.0 < 0.01a)
 Mild 2.0 5.0 < 0.01a)
 Moderate 4.0 6.0 0.02a)
 Severe 1.0 2.0 < 0.01a)
Lower limb injury < 0.01
 None 76.0 72.0
 Mild 4.0 12.0
 Moderate 7.0 12.0
 Severe 13.0 3.0
Oxygen therapy 57.0 35.0 < 0.01a)
Noninvasive ventilation 32.0 0.2 < 0.01a)
Thoracic drainage 14.0 0.2 < 0.01a)
Colloid infusion 7.0 0.2 < 0.01a)
Tranexamic acid 40.0 4.0 < 0.01a)
Pelvic sling 27.0 4.0 < 0.01a)
Catecholamine 43.0 1.0 < 0.01a)
 Systolic blood pressure (mmHg) 137 ± 29 138 ± 28 0.29
 Oxygen saturation (%) 94 ± 7 95 ± 6 < 0.01a)
 Heart rate (beats/min) 90 ± 20 89 ± 19 < 0.01a)
 Respiratory rate (breaths/min) 16 ± 5 16 ± 5 0.24
Pain level (0–10)b) 5 (0–10) 5 (0–10) < 0.01a)
Shock index 0.7 ± 0.3 0.6 ± 0.6 0.03a)
Preemergency status (1–4)c) 2 (1–3) 2 (1–3) < 0.01a)
Glasgow Coma Scale (3–15) 15 (14–15) 15 (15–15) < 0.01a)
Age (yr)d) 54.88 ± 21.44 55.80 ± 22.28 0.13
Male sexd) 72.0 60.0 < 0.01a)

Values are presented as percentage, mean±standard deviation, or median (interquartile range).


Statistically significant value (P<0.05).


No pain, 0.


Healthy, 1; moribund, 4.


Baseline characteristics not used in the algorithm.

Table 2.

Model performance and evaluation of random forest versus Naive Bayes

Variable Random forest Naive Bayes P-value
Overall correctness (%) 97.80 ± 0.37 (97.57–98.03) 93.55 ± 0.71 (93.11–93.99) < 0.01a)
Kappa 0.78 ± 0.04 (0.75–0.80) 0.54 ± 0.03 (0.52–0.56) < 0.01a)
AUC-ROC 0.96 ± 0.01 (0.96–0.97) 0.93 ± 0 (0.92–0.93) < 0.01a)
MCC 0.78 ± 0.04 (0.76–0.80) 0.56 ± 0.02 (0.54–0.57) < 0.01a)
Sensitivity 0.73 ± 0.05 (0.71–0.76) 0.75 ± 0.02 (0.73–0.76) 0.38
 Positive predictive value 0.85 ± 0.03 (0.84–0.87) 0.46 ± 0.03 (0.44–0.49) < 0.01a)
 PRC areab) 0.83 ± 0.04 (0.80–0.85) 0.66 ± 0.09 (0.61–0.72) < 0.01a)
Specificity 0.993 ± 0.002 (0.992–0.994) 0.947 ± 0.008 (0.942–0.952) < 0.01a)
 Negative predictive value 0.984 ± 0.006 (0.980–0.987) 0.984 ± 0.001 (0.983–0.985) 0.85
 PRC areab) 0.996 ± 0.001 (0.996–0.997) 0.992 ± 0.001 (0.992–0.993) < 0.01a)

Values are presented as standard deviation (95% confidence interval).

AUC-ROC, area under the receiver operator curve; MCC, Matthews correlation coefficient; PRC, precision-recall.


Statistically significant value (P<0.05).


Given for the prediction and exclusion of airway management.