Memorial Sloan Kettering Cancer Center New York, NY
S. Elguindi1, N. Y. Lee2, Y. Yu2, J. Jiang1, J. O. Deasy1, and H. Veeraraghavan1; 1Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, NY, 2Department of Radiation Oncology, Memorial Sloan Kettering Cancer Center, New York, NY
Purpose/Objective(s): Artificial Intelligent (AI) assisted contouring of organs at risk (OARs) and lymph nodal stations are provided by multiple vendors but comprehensive clinical target volumes (CTV) that encompass the primary disease, and all subclinical regions are not yet available. We aimed to explore the feasibility of generating complete clinical target volumes for use in radiation therapy planning with an in-house transformer architecture using a few-shot learning approach based on a foundation model. Materials/
Methods: 73 patients were identified with primary oropharyngeal cancer in either the base of tongue (BOT) or the tonsil. The gross tumor volume (GTV) of the primary tumor and all gross nodes along with subclinical regions deemed at risk were combined to create the final CTV. Each cancer type was further sub-dived into right and left sided primary disease to further distinguish the contouring style of each, resulting in four curated training sets from our institution’s experts. A transformer architecture was first pre-trained using an unsupervised learning approach on 10,412 CT images. We then fine-tuned the model on each of the four sub-dived datasets using 12, 18, 23 and 20 patient examples for training of the left-side tonsil, right-side tonsil, left-side BOT, and right-side BOT, respectively. Hyper-parameter settings were fixed at 1.5 mm iso-tropic voxel spacing, window-level HU range -750 to 1750, and a batch size of 1 for a total of 2000 EPOCHs. Ten percent of the training set was used as validation to determine the best model weights based on average volumetric Dice score. Results: The AI transformer architecture reached convergence on the validation dataset in an average of 12.54 hours in under 1000 EPOCHs. Quantitative comparison of an additional testing cohort of 20 patients combined over all model sub-types showed an average volumetric Dice score of 0.74 ± 0.04 [0.61 – 0.81] and an average surface Dice score with an applied 3mm tolerance of 0.71 ± 0.06 [0.56 – 0.82]. Conclusion: We demonstrate the feasibility of using a foundation model to generate comprehensive AI-Assisted CTVs for oropharyngeal cancers using a few-shot learning approach. In some cases, the model was able to achieve over 80% surface agreement on unseen data with as few as 12 training examples. We aim to improve performance of this technique by increasing the training set size and diversity. Lastly, we will integrate multimodal MRI and PET imaging to delineate the gross tumor volume more directly.