Z. R. Li1, L. Yan2, Z. Leng1, X. Liu3, F. Zhaodong2, and Q. Zhou1; 1Department of Research Algorithms, Manteia Technologies Co., Ltd, Xiamen, Fujian, China, 2Fujian Medical University Cancer Hospital, Fuzhou, Fujian, China, 3Manteia Technologies Co.,Ltd, Xiamen, Fujian, China
Purpose/Objective(s): Deep learning (DL) auto-segmentation approaches have gained prominence in delineating regions of interest (ROIs) in radiotherapy planning. However, the effectiveness of DL models is hindered by the inherent ambiguity of ROI boundaries in medical images, exacerbated by distributional disparities between training data and real-world clinical images, resulting in performance degradation in clinical applications. To address this, We propose an algorithm for computing boundary confidence leveraging multiple DL models to minimize overall variance and collectively alleviate the ambiguity associated with ROI boundaries. Materials/
Methods: The proposed algorithm consists of diverse DL models with distinct algorithmic designs and trained on disparate datasets, accompanied by a medical image confidence algorithm for ROI boundary computation. First, each DL model individually predicts a segmentation contour of the ROI. These predictions are then combined by intersecting and unioning the segmentation contours of the individual models. Given that the individual models are intentionally designed to be uncorrelated or weakly correlated, the ensemble variance decreases, reducing the overall prediction error. Subsequently, a trimap is generated by marginally shrinking the intersection contour and expanding the union contour. This trimap serves as an initial segmentation for the medical image confidence algorithm, facilitating the identification of image voxels belonging to the foreground, background, and unknown regions of the ROI. The confidence algorithm then determines the confidence of each voxel in the unknown region to be part of the ROI and determines the final segmentation result. Results: We validated the algorithms effectiveness using CT images from 62 patients with nasopharyngeal carcinoma. Each patients data comprised one manually delineated the Gross Tumor Volume (GTV) and segmentation results of GTV from various DL models. Our method significantly improved the segmentation results through the integration of confidence images derived from multiple DL outputs. The average DICE coefficient across the DL models for all patients was 0.62; however, there were instances where some patients had DICE values substantially lower than this mean. With our algorithm, patients whose DICE scores were initially below 0.4 experienced an average improvement of 60%, while those with DICE scores originally below 0.3 saw their DICE values increase by an average of 273%. Conclusion: The proposed multi-model framework reduces the variance of the prediction error over individual models, leading to improved generalization performance and robustness, especially for ROIs with ambiguous boundaries. This improved segmentation accuracy for ambiguous boundaries can significantly impact clinical practice for challenging cases and increase the generalizability of the DL model within an automatic workflow.