D. Polan1, C. Hadley1, C. Matrosic1, M. Grubb1, S. Jolly1, P. A. Paximadis2, and M. M. Matuszak1; 1Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, 2Department of Radiation Oncology, Corewell Health South, St. Joseph, MI
Purpose/Objective(s): Knowledge-based planning (KBP) has been demonstrated to be a valuable tool for consistently guiding radiation therapy (RT) plan optimization and accessing plan quality. However, vendor specific implementations make it challenging to distribute necessary KBP components, such as dose prediction models, for multi-institutional use. Therefore, implementation of vendor-neutral, open-source KBP frameworks may assist in reducing distribution barriers and ensuring equal access to quality assurance and improvement tools. This work evaluates the performance of independent deep learning (DL)-based dose prediction models for use in a distributable KBP framework to support thoracic RT quality improvement in a statewide radiation oncology quality consortium. Materials/
Methods: 202 conventionally fractionated lung cancer RT plans clinically delivered between 2022 and 2023 were collected from 22 institutions with a variety of commercial treatment planning systems. All plans were optimized for 60 Gy in 30 fractions and met consortium guidelines for contour and dosimetric quality. Data were divided into 162 training cases and 40 validation cases, retaining at least one case from each institution for the training group. Two open-source DL architectures, a hierarchically densely connected U-net (HD U-net) and Cascading U-net (C3D), were evaluated based on prior demonstrated performance for head-and-neck cancer dose prediction. Performance of the models was evaluated using voxel and DVH-based mean absolute error (MAE) and 13 specific DVH metrics of interest for lung-directed RT. Results: Based on MAE, the C3D architecture outperformed the HD U-net with voxel and DVH scores of 1.8 ± 1.2 (3.0 ± 2.0% of prescription dose) and 1.4 ± 0.6, compared to 2.4 ± 1.4 (4.0 ± 2.3%) and 1.4 ± 0.5. For DVH metrics, the C3D model provided significantly (paired T-test, p < 0.05) better performance for Heart D0.03cc, Lungs-GTV/IGTV V5Gy, and PTV D0.1cc, whereas the HD U-net only demonstrated better performance for Lungs-GTV/IGTV V20Gy. For the remaining 9 DVH metrics, differences were not statistically significant. For both models, the largest average prediction errors were found for DVH metrics representing near maximum dose (e.g., spinal cord D0.1cc). Between the models, prediction error for all evaluated metrics was moderately or significantly correlated (Spearman rank order) except for target D95%. Conclusion: The evaluated prediction models provide comparable accuracy to previously reported studies despite application in a unique multi-institutional dataset with diverse planning methods. Although the C3D model outperformed the HD U-net, correlations suggest that performance is more dependent on the case than architecture. Implementation of the models will assist in providing an equally accessible dose prediction model to enable statewide RT quality improvement.