Learning curve and period of experience required for the competent diagnosis of acute appendicitis using abdominal computed tomography: a prospective observational study
Article information
Abstract
Objective
To assess the learning curve of novice residents in diagnosing acute appendicitis using abdominal computed tomography (CT) scans.
Methods
This prospective observational study was conducted within a 4-month period from March 1 to June 30, 2015. After CT scans for right lower quadrant pain or similar acute abdomen were evaluated, postgraduate year 1 (PGY-1) residents completed an interpretation checklist. The primary outcome was evaluation of the learning curve for competent CT scan interpretation under suspicion of acute appendicitis. Secondary outcomes were cumulative numbers of accurate abdominal CT interpretations regardless of initial clinical impression and training period.
Results
PGY-1 residents recorded a total of 230 interpretation checklists. There were 53, 51, 46, 44, and 36 checklists recorded by individual residents and 92, 92, 91, 91, and 61 respective training days in the emergency department, excluding rotation periods in other departments. After 16 to 20 interpretations of abdominal CT scans performed under suspicion of acute appendicitis, the residents could diagnose acute appendicitis with more than 95% accuracy. Overall, the sensitivity and specificity for diagnosing acute appendicitis were 97% (95% confidence interval, 94 to 100) and 83% (95% confidence interval, 80 to 87), respectively. After 61 to 80 abdominal CT interpretations regardless of suspicion of acute appendicitis and after 41 to 50 days in training, PGY-1 emergency department residents could diagnose acute appendicitis with more than 95% accuracy.
Conclusion
PGY-1 residents require 16 to 20 checklist interpretations to acquire acceptable abdominal CT interpretation. After performing 61 to 80 CT scans regardless of suspicion of acute appendicitis, they could diagnose acute appendicitis with acceptable accuracy.
INTRODUCTION
Abdominal computed tomography (CT) scans are no longer considered a special investigation in the emergency department (ED). Because missed or delayed diagnoses are associated with a high morbidity and mortality, the expeditious differential diagnosis of the acute abdomen is necessary [1-3]. Emergency physicians should be proficient at CT scan interpretation because clinical decisionmaking is dependent on image findings, as well as physical examination, clinical history, and laboratory results. However, image interpretation by a radiologist is not always available.
Acute appendicitis is the most common surgical cause of acute abdominal pain [4,5]. Abdominal CT is generally recognized as the best imaging modality to diagnose acute appendicitis in adult patients [6,7]. Emergency physicians at many academic hospitals select abdominal CT as a standard work up tool to evaluate acute appendicitis [3,7,8]. In previous studies, the ability of radiology residents to interpret CT scan images have been established, based on discrepancies after comparing them with attending radiologists [9-11]. However, there have been few reports regarding adequate interpretation experience among ED residents [12].
Radiology training programs have diverse subdivisions, such as neurology, chest, abdomen, musculoskeletal, and interventional radiology [13]. There may be differences among academic hospitals, but most ED training programs do not cover radiologic interpretation, and thus, it is difficult for emergency residents to learn systematic radiologic interpretation. ED residents usually perform CT to confirm suspected diseases, gradually becoming familiar with CT scan interpretation by comparison with radiologist readings and learning from senior residents or attending physicians. Likewise, our ED does not have a separate education program for novice residents to learn to read abdominal CT scans.
To the best of our knowledge, how much experience with preliminary CT interpretation is needed for ED residents to accurately assess acute appendicitis is not yet known. We hypothesized that ED residents would be able to diagnose acute appendicitis after adequate experience with CT interpretation checklists and bedside teaching. The objective of this study was to describe the learning curve of abdominal CT scan interpretation for acute appendicitis during the first 4 months of training of postgraduate year 1 (PGY-1) residents.
METHODS
Study setting and design
This was a prospective observational study performed in a tertiary academic hospital during the 4-month period from March to June 2015. The institutional review board approved this study (AS15030). Informed consent was not required as this study was performed as part of an education program.
This study was conducted in an ED with about 50,000 annual visits. The novice PGY-1 residents had not yet learned about abdominal CT interpretation and performed abdominal CT for patients with suspected acute appendicitis after history taking and physical examination. Residents then interpreted the abdominal CT scan alone and completed preliminary interpretation checklists (Appendix 1). Thereafter, senior residents or emergency attending physicians conducted more intensive interpretations and clinical decisions. The preliminary interpretation performed by the PGY-1 residents was not used for clinical decisions. Finally, the senior ED residents or attending physicians provided bedside teaching on abdominal CT images to the PGY-1 residents. Modification of preliminary interpretation checklists after bedside teaching or radiologist readings was prohibited. We expected that the ability to diagnose acute appendicitis with abdominal CT would gradually improve with accumulated experience and self-directed checklist interpretations.
Study protocol
Baseline characteristics of patients suspected to have acute appendicitis and who underwent abdominal CT were collected. Age, gender, the presence of right lower quadrant pain, the proportion of low-dose CT, Alvarado score, and the presence of acute appendicitis in the final interpretations were analyzed.
To collect preliminary interpretation results from the PGY-1 ED residents, an interpretation checklist was developed by ED staff. The checklist has been used as an ED resident reading form for abdominal CT scans since 2014. All PGY-1 residents recorded their preliminary interpretations on the sheets. Senior residents collected final report from radiologists. Trained researchers collated the sheets from PGY-1 residents and radiologists into a database. Finally, two board-certified emergency physicians assessed discrepancies between the two sheets. The kappa value was calculated to estimate inter-observer agreement.
Final radiologist interpretations were considered the gold standard. Interpretations of PGY-1 ED residents were evaluated using the prescribed protocol (Appendix 2). The sensitivity, specificity, positive predictive value, and negative predictive value of the residents’ report were calculated according to increasing numbers of preliminary interpretations.
The primary outcome of this study was the evaluation of the learning curve for accurately diagnosing acute appendicitis according to increased number of abdominal CTs performed for suspected acute appendicitis. The secondary outcome was the cumulative number of abdominal CTs performed by PGY-1 ED residents regardless of suspicion of acute appendicitis. Of course, the secondary outcome included all the cases in the primary outcome but did not include the CT scans ordered by the doctors in other departments. We also investigated the number of training days required, excluding rotation shift in other departments, to competently diagnose acute appendicitis according to the length of the ED training period.
Data analysis
Data were collected in an Excel database (Microsoft Co., Redmond, WA, USA) and translated into SPSS and SAS formats. Analyses were performed with IBM SPSS Statistics ver. 20.0 (IBM Corp., Armonk, NY, USA) and SAS ver. 9.4 (SAS Institute, Cary, NC, USA). Two board-certified emergency physicians assessed the interpretation discrepancies between PGY-1 ED residents and radiologists. To assess inter-rater agreement, Cohen’s kappa coefficient was calculated. Cohen’s kappa coefficient has a value ranging from 0 (perfect discordance) to 1 (perfect accordance). Kappa value comparisons were performed by analyzing 95% confidence intervals. P-values <0.05 were considered statistically significant throughout this study.
RESULTS
A total of 230 patients who were suspected to have acute appendicitis underwent abdominal CT. The baseline characteristics of the enrolled patients are summarized in Table 1. The average age was 45.2±17.4 years, and there were 118 male patients (51.3%). Most patients (210, 91.3%) complained of right lower quadrant pain when initially evaluated in the ED before CT scans were performed. There were 35 low-dose CTs (15.2%) performed. The average Alvarado score was 6.1±2.8. There were 156 patients diagnosed with acute appendicitis in the final radiologist’s report (68.3%).
PGY-1 residents recorded a total of 230 preliminary interpretation checklists after performing abdominal CT scans for suspected acute appendicitis (Fig. 1). There were 53, 51, 46, 44, and 36 checklists recorded by the respective residents. The average interpretation accuracy with increased experience was 72% for 1 to 5 cases, 84% for 6 to 10 cases, 88% for 11 to 15 cases, 100% for 16 to 20, and 96% for 21 to 25 cases (Table 2). After 16 to 20 cases of preliminary interpretation of abdominal CT performed for suspected acute appendicitis, PGY-1 novice residents could diagnose acute appendicitis with more than 95% accuracy (Fig. 2A). Table 2 shows the interpretation accuracy according to the number of abdominal CT scans performed by each resident. There were some individual variations, but accuracy gradually improved.
Sensitivity and negative predictive values were 100% for all intervals, excluding the group for 21 to 25 cases (Table 3). Specificity was initially 63% and gradually increased with accumulated interpretations, reaching 100% in the group of 16 to 20 cases. The positive predictive value was initially 46% and also gradually increased with accumulated interpretations, reaching 100% with 16 to 20 cases.
The total number of abdominal CT scans performed by individual residents regardless of their suspicion of acute appendicitis was 168, 157, 156, 149, and 122 during the study period. There were 156 patients diagnosed with acute appendicitis among 600 (120 per resident) cumulative CT scans regardless of suspicion of acute appendicitis (156/600, 26.0%). The average interpretation accuracy for acute appendicitis was 72% for 1 to 20 cases, 87% for 21 to 40 cases, 89% for 41 to 60 cases, 96% for 61 to 80 cases, and 100% for 81 to 100 and 101 to 120 cases (Table 4). After 61 to 80 cases of abdominal CT scans performed regardless of suspicion of acute appendicitis, ED residents could diagnose acute appendicitis with more than 95% accuracy (Fig. 2B). During the 4-month research period, PGY-1 ED residents rotated through other clinical departments for a total of 1 or 2 months. The time during rotations in other clinical departments was excluded when calculating the total ED training period. The respective residents spent 92, 92, 91, 91, and 61 training days in the ED. The average interpretation accuracy was 65% for 1 to 10 days, 86% for 11 to 20 days, 90% for 21 to 30 days, 91% for 31 to 40 days, 100% for 41 to 50 days, and 96% for 51 to 60 days (Table 5). After 41 to 50 days of training, ED residents could diagnose acute appendicitis with more than 95% accuracy. Two emergency attending physicians assessed the interpretation accuracy of ED residents. To evaluate inter-observer agreement, Cohen’s kappa coefficient was calculated as 0.969.
DISCUSSION
This study showed that, after 16 to 20 interpretations of abdominal CT scans performed for suspected acute appendicitis, PGY-1 ED residents diagnosed acute appendicitis with satisfactory accuracy. The increased pattern of accuracy was different for each individual. Although all PGY-1 ED residents had completed an internship just before participating in this study, there were likely to have been differences by way of previous experience and knowledge. Nevertheless, the average accuracy of interpretations improved with increasing number of interpretations.
In the present study, two emergency attending physicians assessed the interpretation accuracy of the residents. Regarding the objective interpretation of the results, inter-scorer agreement was very high (kappa coefficient 0.969). Out of 230 cases, there were only two instances of discrepancy between scorers. In the first instance, the preliminary interpretation checklist described the appendix as being 4 mm in diameter and inflamed, but the final interpretation by the radiologist identified the appendix to be normal. Scorer A considered this case incorrect, whereas scorer B considered it correct because of the exact description of appendix size. In the second instance, the preliminary interpretation checklist described the appendix as not visualized, but the final interpretation by the radiologist found the appendix to be enlarged with a diameter of 13 mm due to secondary change. Scorer A considered this case correct, whereas scorer B considered it incorrect because the final interpretation did not exclude the possibility of acute appendicitis.
There were several discrepancies between the residents’ interpretations and the radiologists.’ For convenience, we classified those discrepancies as false positives and false negatives. The discrepancy was classified as a false positive if the radiologist identified no evidence of acute appendicitis even though the resident identified acute appendicitis. False positives were said to have occurred if: (1) the resident identified acute appendicitis because of an enlarged appendix (with a diameter of ≥7 mm), but the radiologist identified no appendicitis due to lack of inflammatory sign, or (2) the resident identified an acute appendicitis due to the presence of an appendicolith with a borderline diameter of 5 to 6 mm, but the radiologist identified it as a simple appendicolith and not an appendicitis. Some false positives were identified by different opinions or views about secondary inflammatory changes on the appendix between the radiologists and ED residents. By contrast, a false negative was said to have occurred if the radiologist identified an acute appendicitis, even though the resident had not identified it as being so. In most instances of false negatives, we assumed that the ED residents might have missed the diagnosis because of the atypical location of the appendix.
The Alvarado scoring system was developed to improve physician accuracy in diagnosing acute appendicitis, and is based on 8 clinical factors [14]. This scoring system has been validated in several studies, yielding significant sensitivity and specificity. In this study, 156 (67.8%) patients were diagnosed with acute appendicitis based on CT scan interpretations, and the average Alvarado score was 6.1±2.8. The high score group (≥8 points) was more likely to have acute appendicitis than the low-score group (≤4 points) in the present study (91.1% vs. 42.3%).
Jo et al. [8] have reported that using pathological findings as the gold standard, the accuracy of a CT scan diagnosis is statistically higher than that of the Alvarado score and resident’s clinical prediction. In particular, the positive predictive values for acute appendicitis determined by emergency and surgery department residents were not significantly different. It is reasonable to perform an abdominal CT scan before a surgical consultation. In our hospital, the emergency physicians usually evaluate an acute abdomen by using a CT scan as a primary tool before consulting a surgeon. The positive predictive value of a resident’s prediction was 67.8%, while the predictive value of the Alvarado score higher than 8 was 91.1%.
A previous study has reported on the learning curve of resident physicians using ultrasonography for diagnosing obstructive uropathy [15]. The physicians training in emergency ultrasonography were shown to accurately diagnose obstructive uropathy after 30 exams. Another study assessed the learning curve for coronary CT angiography. According to that study, although increasing experience with coronary CT angiography improved the diagnostic performance of inexperienced physicians, acquiring expertise in coronary CT angiography was a slow process and required more than 1 year of practice [16]. The previous studies show that considerable experience is required to diagnoses specific diseases with a specific imaging modality. For abdominal CT scans, one of the most frequently used diagnostic modalities in the ED, there are few reports about the experience needed by emergency physicians for competent interpretation. We investigated the learning curve of abdominal CT scan interpretation for acute appendicitis among ED residents. These results will contribute to the creation of appropriate education protocols regarding abdominal images.
According to a recent study, low-dose CT was not inferior to conventional CT in diagnosing acute appendicitis [17]. Low-dose CT scan images were interpreted by an attending radiologist with adequate experience in their interpretation. In our institution, lowdose CT scans are only performed among patients 15 to 44 years old if informed consent is obtained. In the current study, there were 35 low-dose CTs performed for suspected acute appendicitis out of a total of 230 CTs. Because low-dose CT scan images generally have lower resolutions than conventional CT, inexperienced physicians may have some difficulty in diagnosing acute appendicitis, which can affect the overall accuracy of CT scan interpretations.
Wechsler et al. [18] have reported that increased experience in CT interpretation reduces discrepancy rates between attending radiologists and radiology residents. Our study also showed that the learning curves of CT interpretation are proportionally increased according to a resident’s experience. Novice residents, lacking experience or the prerequisite knowledge, may miss abnormal findings rather than interpret normal anatomic structures as abnormal. In other words, a resident’s interpretation may have a relatively low sensitivity and a high specificity. However, in the present study, both sensitivity and negative predictive value were 100%, excluding the group within the 21 to 25 interval. The specificity and positive predictive value were both initially low (45% and 63%, respectively) and gradually increased, reaching 100% in the 16 to 20 interval. The overall sensitivity and specificity for the diagnosis of acute appendicitis were 97% (95% confidence interval, 94 to 100) and 83% (95% confidence interval, 80 to 87), respectively. Because novice residents might conclude their final interpretation on the basis of clinical history and physical examination, the sensitivity and negative predictive value are relatively high. In the management of patients with acute abdomen in the ED, it is more difficult to conclude that a patient is normal and can be discharged. This situation leads to a slowly increasing specificity and positive predictive value until they have confidence in their interpretation. Previous radiological studies have focused on using only image findings without clinical information such as present illness, physical exam, and laboratory results. However, the present study evaluated the interpretation capability for acute appendicitis depending on the clinical history. The results may have been influenced by the inclination of novice ED residents to overestimate the possibility of acute appendicitis in preliminary interpretations of abdominal CT scan images.
One limitation of this study is that the results were derived from a small sample size. There are 5 PGY-1 residents in the ED. In addition, there was no control group due to ethical reasons. The subjects in the present study were all residents training in the ED. During the study period, it was difficult to evaluate the effect of preliminary interpretation and bedside teaching on interpretation accuracy. Another limitation is the criteria for suspecting acute appendicitis, which might differ between clinicians. Nevertheless, our inclusion criteria targeted patients suspected to have acute appendicitis based on a clinical impression. Although we assumed that the residents were all novices at interpreting abdominal CT scan images, there might have been significant differences in experience and knowledge among residents.
In conclusion, after 16 to 20 preliminary abdominal CT interpretations performed for suspected acute appendicitis, PGY-1 ED residents accurately diagnosed acute appendicitis. After 61 to 80 abdominal CT interpretations, regardless of the suspicion of acute appendicitis, and after 41 to 50 days of training, ED residents could diagnose acute appendicitis with more than 95% accuracy. Assessing learning curves allows monitoring the trainee learning process. In the future, studies including larger populations would help to assess the CT interpretation learning curve for other diseases, as well as acute appendicitis. These studies will allow evaluation of the learning process and help to create concrete education protocols for radiologic images.
Notes
No potential conflict of interest relevant to this article was reported.
Acknowledgements
Jae-hyung Cha contributed to our work by advising about statistical analysis.
References
Appendices
Appendix 1. Preliminary interpretation checklist for ED residents
ceem-17-209-app.pdfAppendix 2. Protocol for assessing the two final reports
ceem-17-209-app.pdfArticle information Continued
Notes
Capsule Summary
What is already known
Abdominal computed tomography is generally recognized as the best imaging modality to diagnose acute appendicitis in adult patients.
What is new in the current study
Emergency department residents can diagnose acute appendicitis accurately after adequate experience with computed tomography interpretation checklists and bed-side teaching.