D. S. Bitterman1, C. V. Guthier1, K. Y. Shin2, R. Zeleznik3, P. Doyle4, M. Guevara5, H. Kang6, M. Dillon-Martin7, T. Perkins8, L. T. Orlina9, Y. Pei8, J. S. Bredfeldt5, P. J. Catalano10, J. R. Bellon11, R. S. Punglia11, L. Warren12, J. S. Wong11, D. E. Kozono1, and R. H. Mak1; 1Department of Radiation Oncology, Dana-Farber Cancer Institute/Brigham and Women’s Hospital, Harvard Medical School, Boston, MA, 2Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, 3Artificial Intelligence in Medicine (AIM) Program at Harvard, Boston, MA, 4Brigham and Womens Hospital/Dana-Farber, Boston, MA, United States, 5Department of Radiation Oncology, Brigham and Women’s Hospital/Dana-Farber Cancer Institute, Boston, MA, 6Boston University Chobanian & Avedisian School of Medicine, Boston, MA, 7Brigham and Womens Hospital/Dana-Farber, Boston, MA, 8Brigham and Womens Hospital, Boston, MA, 9Department of Radiation Oncology, Dana-Farber Cancer Institute/Brigham and Womens Hospital, Boston, MA, 10Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, 11Dana-Farber/Brigham and Womens Cancer Center, Boston, MA, 12Department of Radiation Oncology, Dana-Farber Cancer Institute, Boston, MA
Purpose/Objective(s): Artificial intelligence (AI) normal tissue contouring tools are now widely available and used in many treatment planning systems. However, few studies have prospectively demonstrated value of such tools when employed in clinical workflows. We conducted a randomized controlled trial to assess the benefit of using an AI heart contouring algorithm to assist breast radiotherapy (RT) planning. Materials/
Methods: Single institution, 2-arm randomized controlled crossover trial was undertaken between 12/2021 and 10/2023. A convolutional neural network for heart auto-contouring, with median Dice 0.95 compared to gold standard contours and reduced contouring time by 50% in pre-clinical studies, was implemented as a scripted tool within a treatment planning system. Eligible patients had breast cancer planned for RT, with hearts contoured according to the randomization of the dosimetrist assigned to their plan. Dosimetrists were stratified by experience and randomized to 1) manual heart contouring first or 2) AI-assisted heart contouring first. AI-assisted contouring consisted of running the model within the treatment planning system and editing the contour as needed. After completing 5 cases on the initial contour strategy, dosimetrists crossed to the other strategy. Co-primary outcomes were feasibility and efficacy, defined as heart contouring time (measured by recording screen during contouring). Secondary endpoints included dosimetrist and treating physician contour assessments. Trial had 90% power to detect a 30% reduction in contour time with 2-sided type I error of 5%. Results: 118 patients enrolled; 60 patients’ hearts were contoured manually and 58 with AI assistance. 11 dosimetrists enrolled; 5 randomized to manual first arm and 6 to AI-assisted first arm. No difference in manual vs. AI-assisted contour time overall (mean 277.3 ± 151.2 vs. 267.2 ± 199.0 secs, p=0.76), on the manual first arm only (p=0.15), or on the AI first arm only (p=0.67). No difference in contour time among only dosimetrists with ?2 yr experience (mean 374.4 ± 174.2 vs. 403.3 ± 275.9 secs, p=0.72), nor among dosimetrists with >3 yr experience (mean 241.9 ± 126.4 vs. 210.8 ± 121.9 secs, p=0.25). Dosimetrists considered AI contours acceptable with minor/no modification in 13/47 (27.6%) cases and unacceptable in 34/47 (72.4%) cases; but considered the AI helpful in 35/47 (74.5%) cases and to improve subjective efficiency in 29/47 (61.7%) cases. Physicians, blinded to randomization, thought contours presented for review were unacceptable in 6/56 (10.7%) AI-assisted and 3/56 (5.4%) manual cases. Conclusion: Despite improving efficiency pre-clinically, AI assistance did not reduce heart contour time compared to standard manual contouring. Physicians were more likely to find contours unacceptable when AI-assistance was used. Pre-clinical findings may not translate into clinical benefits, emphasizing the critical need to evaluate new AI technologies under their intended use in real clinical workflows.