McGaw Medical Center of Northwestern University Chicago, IL
Q. S. Zhang1, J. Kang2, K. Lybarger3, M. Glenn4, P. A. Sponseller2, M. H. Blau2, and E. C. Ford2; 1Northwestern University Feinberg School of Medicine, Lurie Cancer Center, Department of Radiation Oncology, Chicago, IL, 2University of Washington School of Medicine, Fred Hutch Cancer Center, Department of Radiation Oncology, Seattle, WA, 3George Mason University, Fairfax, VA, 4The University of Texas MD Anderson Cancer Center, Department of Radiation Physics, Houston, TX
Purpose/Objective(s): Incident learning provides valuable narrative information about safety and quality, but is difficult to analyze due to the large volume of unstructured data. We present a semi-automated method to categorize safety-related event reports and validate this method in the Radiation Oncology context. Materials/
Methods: We employed natural language processing (NLP) and Bayesian statistical topic modeling to analyze English text from 7,174 safety-related event reports in a Radiation Oncology department, dated between February 2012 and April 2021. Word-like, space-delimited units of text called "tokens" were pre-processed and filtered for meaningful content, resulting in 4,216 tokens. We then considered only reports that contained at least one of the filtered tokens, resulting in 6,760 reports. We then fit a topic model with K=50 topics, estimating posterior probability distributions over tokens and proportions of each report that were about each of the 50 topics. The topic model allowed a report to be about multiple topics. To validate the model by assessing whether experts could agree on meaningful topic labels for these topics, five experts first independently assigned meaningful topic labels to topics by reading the top ten most probable tokens per topic along with tokens’ posterior probabilities (“top ten tokens” approach). Experts then performed a separate assessment by reading any reports that were estimated to be at least 90% about a topic (“case reports” approach). Consensus topic labels (CTLs) such as “Brachytherapy” or “Orders” were then assigned to topics through majority vote with each approach. Results: Of 50 topics from the model, 37 topics (74%) had expert agreement on the CTL assignment (>= 1 majority vote CTL), which provides evidence for the validity of the model. There were 36 topics that had a majority vote CTL assigned to them via the top ten tokens approach. The case reports approach was applied to the 20 topics that the approach could be applied to, and of these topics, 18 topics had a CTL assigned via expert agreement. Conclusion: We have demonstrated an NLP/statistical modeling approach that provides a semi-automated method to categorize safety-related event reports in Radiation Oncology by topic, allowing a report to be about multiple topics. This approach offers an expedited alternative to a human reader reading large volumes of text reports.