E. Gibson1, Y. Yoo1, J. Lian2, W. Hesheng3, C. Shen4, M. M. Kim5, D. Kondziolka6, Y. Cao5, and J. Balter5; 1Siemens Healthineers, Princeton, NJ, 2University of North Carolina, Chapel Hill, NC, 3New York University, New York, NY, 4Department of Radiation Oncology, University of North Carolina, Chapel Hill, NC, 5Department of Radiation Oncology, University of Michigan, Ann Arbor, MI, 6Department of Neurosurgery, NYU Langone Health, New York, NY
Purpose/Objective(s): AI tools are increasingly used in Radiation Oncology to assist in delineating tumors and normal tissues. There is a real need to assess how they affect contouring accuracy as compared to manually drawn structures. An AI-based prototype has been recently developed to aid detection and delineation of intracranial metastases by providing “pre-contours” to physicians. This study aimed to evaluate if providing AI-generated pre-contours reduces interobserver variability in contouring brain metastases GTVs on MR images, and the extent of editing needed for these contours. Materials/
Methods: Three clinicians contoured 149 brain metastasis GTVs on post-contrast T1w MPRAGE MR images from 20 patients from 3 centers. The median GTV (IQR, max) was 0.06 (0.03 – 0.19, 13.8) ml. After a 3-month period, the clinicians were presented with contours either generated by themselves or by an AI-based brain metastasis GTV contouring algorithm (randomized at the tumor level). Clinicians were tasked to edit the contours until they were deemed clinically usable for SRS planning. Contour differences were measured using 95th percentile Hausdorff distances (HD95) and Dice similarity coefficients (DSC). The study compared (1) the interobserver variability of contours with and without AI pre-contours (with paired t-tests on GTVs pre-contoured by AI), and (2) the magnitude of contour editing for AI pre-contouring vs. clinician’s own pre-contouring and vs. interobserver variability (with unpaired t-tests). Results: The interobserver variability with AI pre-contours (HD95=0.82±0.44 mm, DSC=0.89 ±0.11) was significantly lower than without AI pre-contours (HD95=1.54±3.94 mm, DSC=0.72±0.17), with p=0.005 for HD95 and p<0.001 for DSC. Clinicians edited 31% (63/204) of their own GTV contours and 59% (141/243) of AI generated contours. Clinician edits for their own pre-contours (HD95=0.15±0.36 mm, DSC=0.98±0.05) were significantly smaller than for AI pre-contours (HD95=0.45±0.56 mm, DSC=0.93±0.11) with p<0.001 but both were significantly smaller than interobserver variability with or without pre-contouring with p<0.001. Conclusion: This study suggests that AI pre-contouring reduces interobserver variability in GTV contouring. The level of editing of AI pre-contours needed was much smaller than the interobserver variability in contouring, and needed editing at only twice the rate that clinician’s own prior contours did. Future work will analyze the impact of the interobserver differences and the pre-contour edits on treatment plans. The concepts and information presented are based on research results that are not commercially available. Future commercial availability cannot be guaranteed. Research reported in this publication was supported by the National Cancer Institute of the National Institutes of Health under Award Number R01CA262182. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.