E. J. Heo1, S. H. Cho2, M. Kwak3, K. H. Chang4, J. B. Shim2, N. K. Lee2, and S. Lee2; 1Department of Medical Physics, Graduate School of Korea University, Sejong, Korea, Republic of (South), 2Department of Radiation Oncology, College of Medicine, Korea University, Seoul, Korea, Republic of (South), 3Department of Engineer, Majung solution, Goyang, Korea, Republic of (South), 4Department of Radiologic Science, Far East University, Chungcheongbuk-do, Korea, Republic of (South)
Purpose/Objective(s): To improve physician satisfaction with auto-segmentation results for breast cancer, we developed an on-site trained auto-segmentation model using our institutional data and evaluated its feasibility compared to a commercial auto-segmentation model performance. Materials/
Methods: Retrospective data from 80 breast cancer patients at our institution were obtained. A physician contoured 5 OARs (heart, left and right lung, esophagus, and thyroid) based on institutional guidelines. To generate our institutional data, physician edited on patients’ contours after auto-segmentation model prospectively generated auto-segmentation results. We employed 50 patients for analyzing on-site trained auto-segmentation model performance. The on-site trained auto-segmentation model which trained with our institutional data was based on residue-Unet model. A subset of 30 patients was used for on-site trained auto-segmentation model validation. Reference dataset was included in 30 breast cancer patients’ contours, which used to treat previously at our institution. We generated OAR datasets with two different models and compared to the reference dataset: (A) commercial auto-segmentation solution trained with commercial dataset, (B) same commercial auto-segmentation solution trained with on-site trained dataset, which were our institutional data. Dice similarity coefficient (DSC), 95% hausdorff distance (HD95) were used to comparison. A Wilcoxon signed rank test was performed to assess differences between two auto-segmented models (p-value<0.05). The physician scored the auto-segmentation contours by rating the level of satisfaction. Results: For on-site trained auto-segmentation validation results, all DSC results for OAR have statistically differences between two different models (heart: 0.93±0.03 vs. 0.96±0.01, p-value < 0.05, left lung: 0.98±0.01 vs. 0.97±0.01, p-value=0.057, right lung: 0.98±0.00 vs. 0.98±0.00, p-value=0.007, esophagus: 0.78±0.05 vs. 0.75±0.05, p-value=0.001, thyroid: 0.84±0.04 vs. 0.79±0.07, p-value<0.001). Also, the HD95 results for heart, esophagus and thyroid have statistically differences between two models (heart: 14.61±8.86 mm vs. 5.69±3.02 mm, p-value<0.001, esophagus: 6.37±4.18 mm vs. 10.96±5.84 mm, p-value<0.001, thyroid: 4.22±2.29 mm vs. 5.34±2.05 mm, p-value=0.039). The inferior borders of the heart, left lung, and right lung were poorly auto-segmented by the on-site trained auto-segmentation model due to low contrast, and HD95 is sensitive to poor auto-segmentation results. However, physician satisfaction for the on-site trained auto-segmentation model was higher. Conclusion: We found the feasibility of the on-site trained auto-segmentation model using institutional data to improve physician satisfaction for auto-segmentation model performance for breast cancer.