Detroit Medical Center/Wayne State University Detroit, MI
J. Jeberaeel1, A. L. Kieft2, R. Boggula3, and S. R. Miller II4; 1Karmanos Cancer Center - Wayne State University, Detroit, MI, 2Detroit Medical Center - Wayne State University, Detroit, MI, 3Karmanos Cancer Institute, Detroit, MI, 4Wayne State University, Detroit, MI
Purpose/Objective(s): Utilizing OpenAI-based chatbots ability to retrieve relevant patient information from physician consultation notes. Materials/
Methods: 10 unique questions were assessed on 10 diverse patients notes covering various cancer treatment sites. Sample questions included asking the AI chatbot a patients name, age, gender, date of birth, presenting symptoms, diagnosis, date of diagnosis, previous medical and surgical history, familial oncologic history, previous oncologic treatment, KPS score, treatment recommendations, and if the patient smokes and is continuing to smoke. The questions were designed to represent realistic relevant information for Radiation Oncology chart rounds, tumor boards, and physician-handoffs. A chatbot system was developed using OpenAIs GPT-4 model to generate responses. A custom rubric system with a rating from 1 to 5 (1 = strongly disagree, 5 = strongly agree) was developed, which included several criteria: accuracy of content, completeness, context awareness, transparency of limitations, and lack of bias. Descriptive statistics were performed to evaluate the overall performance of the system. Results: The overall performance of the chatbot achieved an average rubric score of 4.5 (± 0.9) out of 5 across all criteria. The results of descriptive statistics are as follows: the mean score (standard deviation) for accuracy of content is 4.4 (± 1.1), completeness is 4.4 (± 1.1), context awareness is 4.4 (± 1.0), transparency of limitations is 4.5 (± 0.9), and lack of bias 4.7 (± 0.5). The Lack of Bias category received the highest average score and the lowest standard deviation, indicating a strong consensus on the systems ability to provide unbiased responses. In contrast, Accuracy of Content and Completeness had the highest standard deviations, suggesting more variability in the responses from OpenAI and identifying these areas as potential focal points for enhancement. Conclusion: Our research indicates that OpenAIs chatbot is a useful tool, specifically for retrieving patient information from consultations for tumor boards, chart rounds, and physician-handoffs. However, despite the excitement around using AI in clinical settings, it is crucial to conduct a comprehensive evaluation before its implementation to fully understand the limitations of such systems. A rubric similar to ours can provide valuable insights into the strengths and weaknesses of these systems.