German Oncology Center Freiburg, Baden-Würtenberg, Germany
A. Christoforou1, T. Sprave2, R. G. Stoian2, S. K. B. Spohn2, A. Ruehle2,3, N. H. Nicolay2,4, T. Fechter2, I. Popp5, I. Strouthos1, A. Grosu5, and C. Zamboglou1,2; 1German Oncology Center, Limassol, Cyprus, 2Department of Radiation Oncology – University Medical Center Freiburg, Freiburg, Germany, 3Department of Radiation Oncology, University Hospital Leipzig, Leipzig, Germany, 4Department of Radiation Oncology, University of Leipzig Medical Center, Leipzig, Germany, 5Department of Radiation Oncology, Medical Center - University of Freiburg, Faculty of Medicine, Freiburg, Germany
Purpose/Objective(s): The implementation of synthetic data (SD) might enhance clinical research in radiation therapy (RT) while protecting patient privacy. Especially, in the era of artificial intelligence, big datasets are needed for outcome prediction modelling. The best methodology to create SD in the field of RT and to assess its performance is unknown. Consequently, the aim of this work was to create a framework for SD creation and evaluation with machine-learning (ML) based models and to assess the application for data-sharing scenarios. Materials/
Methods: Five retrospective survival related datasets (n=1038 recurrent prostate cancer, n=109 primary localised prostate cancer, n=46 metastasised prostate cancer, n=1072 head and neck cancer, n=298 gliomas) with patients undergoing RT in different scenarios were collected. Four different ML-based models (Tabular Variational Autoencoder (TVAE),Conditional Tabular General Adversarial Networks, Gaussian Copula, Hybrid Copula General Advesarial Network) were applied to create multiple iterations of SD for each dataset, all being the same size as the original datasets. Subsequently, SD datasets were compared with their original counterpart, with initial exclusion criteria including p<0.05 in the log-rank test and >5% of exact data row matches. From the remaining candidates, the most suitable SD were chosen based on a variety of metrics, including the concordance index and comparison of hazard ratios from multivariate Cox Proportional Hazards. Results: There was no significant difference (p-value=0.704, Student-t test) in the Concordance indexes between the best performing SD and their respective original counterpart. The hazard ratios found from fitting the SD were consistently within the corresponding 95% confidence intervals of the original data’s’ confidence intervals, suggesting strong explainability. Most SD sets had 0% exact matches (with numeric tolerance) compared to their original counterpart. The Tabular Variational Autoencoder (TVAE) method consistently produced optimal results, with all final SD sets chosen stemming from its usage. Conclusion: We present a framework, comprising of methods and metrics, to create and to evaluate the performance of SD in the field of RT. The final SDs that were chosen achieved a high level of privacy, explainability and predictive efficacy. This showcases the strong recent advancements of synthetic data-generating ML models, especially TVAEs.