2372 - Evaluating Commercial Auto-Segmentation Software Generated Contours on MRI and their Clinical Acceptability for Cranial Stereotactic Radiosurgery
H. Moktan, R. D. Mali, Y. Haque, Y. Hou, K. Kane, S. S. Varghese, R. L. Rotondo, and K. Guida; Department of Radiation Oncology, University of Kansas Medical Center, Kansas City, KS
Purpose/Objective(s): Commercially available artificial intelligence (AI)-based auto-segmentation tools have become increasingly popular in radiation oncology workflows. AI can generate accurate and reproducible contours and impact patient throughput and contour standardization. For Stereotactic Radiosurgery (SRS), MRI is paramount for accurate delineation of targets, as well as organs-at-risk (OARs). In this study, the clinical acceptability of OAR contours autogenerated by three commercially available software platforms using MRI datasets has been evaluated. Materials/
Methods: Radformation AutoContour (AC), Limbus AI (LAI) and BrainLab Elements (E) were used to generate cranial OAR contours for 15 previously treated SRS cases using T1-post-contrast MRIs. Thin-slice MRI datasets varied in machine and quality to represent the spectrum of possibilities in clinical practice. Six OAR contours were evaluated in this study, including the brainstem, chiasm, and left/right optic nerves and hippocampi. In a blind evaluation involving radiation oncologists and residents, OAR contours were ranked 1 (best) through 3 (worst) and their clinical acceptability. The qualitative assessments were supplemented with quantitative measurements including Dice Similarity Coefficient (DSC), Center of Mass (COM) displacement, and volumetric differences derived by comparing them with physician-approved contours. Results: LAI yielded the highest scores from the physician group, receiving a ranking score of 1, 2, and 3 of 57%, 30%, and 12%, respectively. AC (34%, 42%, 25%) and E (8%, 28%, 62%) ranked second and third, respectively. LAI received the top ranking for 5 of the 6 OARs, with AC performing the best for chiasm. Physician evaluations suggested that none of these autogenerated contours were clinically acceptable without adjustments. This qualitative assessment was also in agreement with low DSC scores of (=85% for brainstem and =39% for all other structures) as shown in Table. The COM displacement was greater than 2.1mm on average (0.2-13.9); however, these differences were not statistically significant among the software. LAI (0-30%) represented the smallest range in volumetric differences amongst AI and physician-generated OARs, as compared to AC (7-69%) and E (2-57%). Conclusion: Limbus AI generated the best OAR contours on MRI datasets. All three software platforms performed reasonably well on larger structures like brainstem, however the accuracy and clinical acceptability diminished with decreasing volume of OARs. Therefore, auto-segmentation solely based on MRI datasets was not clinically acceptable for cranial radiosurgery. Table: DSCs (%) for contours autogenerated by three AI software compared to clinically used contours.