Z. Kaffey1, K. A. Wahid2, D. Farris3, L. Humbert-Vidan4, T. Netherton5, G. Balakrishnan6, A. C. Moreno4, M. Naser4, C. D. Fuller4, D. Fuentes2, and M. Dohopolski7; 1University of Texas MD Anderson Cancer Center UTHealth Graduate School of Biomedical Sciences, Houston, TX, 2Department of Imaging Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, 3Research Medical Library, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA., Houston, TX, 4Department of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, 5Department of Radiation Physics, The University of Texas MD Anderson Cancer Center, Houston, TX, 6Rice University, Houston, TX, USA., Houston, TX, 7Department of Radiation Oncology, The University of Texas Southwestern Medical Center, Dallas, Texas, USA, Dallas, TX
Purpose/Objective(s):Artificial intelligence (AI) applications within radiotherapy (RT) have shown immensepromise; however, methods describing AI uncertainty, i.e. uncertainty quantification (UQ), are critical for anyfuture real-world clinical applications.Therefore, UQ is being increasingly studied in RT applications. Despite the potential importance of UQ, no systematic or scoping reviews have been conducted on UQ applications within RT-applied AI. This study evaluates the current state of AI UQ in RT applications, identifies existing gaps and areas for improvement, and proposes directions for future research. We hypothesized most articles would investigate auto-contouring andutilize commonly established UQ methodology. Materials/
Methods: Our scoping review adhered to the PRISMA-ScR guidelinesusing the population, concept, context framework which required articles to report on clinically geared RT applications of AI with UQ. This framework guided the formulation of our search strategy, aimed at capturing the breadth and depth of the subject matter. A comprehensive search was executed across multiple databases including PubMed, Web of Science, and Cochrane Library. 8974 articles were screened by two independent reviewers, with a third to resolve conflicting assignments. 151 articles were sent for full-text review. RT-specific information, AI design features, and UQ methodologies were extracted from the final studies. Results: We identified 56 final articles (2015-2024). In terms of RT application spaces, most studies evaluated UQ for auto-contouring (50%), followed by image-synthesis (13%), and multiple applications simultaneously (11%). Head and neck cancerwas the most common disease site independent of application space (32%). Imaging data was used in 91% ofstudies, while only 13% incorporated RT dose information. Median (interquartile range) patient training, validation, and test set sizes were 63 (142.25), 10 (31.5), and 25 (46), respectively, with most studies utilizing supervised learning frameworks (88%). Most studies focused on failure detection as the main application of UQ (60%), with Monte Carlo Dropout being the most commonly implemented UQ method (32%) followed by Ensembling(16%). 55% of studies did not share code or datasets for public re-use. Conclusion: AI UQ marks a significant move towards impactful RT advancements. However, our review revealeda lack of diversity in RT applications beyond auto-contouring, e.g., minimal studies related to dose prediction. Moreover, there was a clear need to study additional methods of UQ, such as conformal predictions. Finally, our results may incentivize the development of standardized guidelines for transparent reporting and implementation of UQ in RT.