PQA 09 - PQA 09 Hematologic Malignancies and Digital Health Innovations Poster Q&A
3374 - Pathology AI for Risk Stratification (PAIRS): Predicting Adverse Pathological Features in Post-Operative Head and Neck Cancer with Large Language Models
J. Grandinetti, C. Friedes, R. J. L. Maxwell, Z. Belal, J. N. Lukens, A. Lin, W. R. Green, and R. McBeth; Department of Radiation Oncology, University of Pennsylvania, Philadelphia, PA
Purpose/Objective(s): Stratifying patients into risk categories based on pathology reports is a critical task in post-operative head and neck cancer care, yet it is resource-intensive and susceptible to potential human error. This study introduces a novel approach employing large language models (LLMs) to automate the extraction and prediction of staging and risk factors from pathology reports, proposing a robust method that could potentially streamline and enhance the clinical workflow for head and neck cancer patients. Materials/Methods: We developed an LLM framework called Pathology AI for Risk Stratification (PAIRS), which employs an agent-based approach with concurrent computing for efficient and accurate risk factor identification. PAIRS utilizes a HIPAA-compliant Azure/OpenAI GPT4 endpoint with GPT4-Vision for optical character recognition to process any scanned pathology records. To ensure maximal retention of critical staging features while minimizing inference times, we implemented an extraction-abstraction framework that utilizes prompt engineering techniques for multi-level summarization. PAIRS was used to identify and predict primary tumor staging, nodal involvement, overall risk, and generate succinct ‘one-liner’ summaries of head and neck cancer patients with definitive surgery that had been referred to the radiation oncology department. Adverse pathologic features in patients included intermediate risk factors such as pT3-4, multiple positive LNs, LN size >3 cm, LVI, PNI, and close margin. High-risk factors were identified as positive surgical margins (PM) or extracapsular nodal extension (ENE). Predictions made by PAIRS were compared against physician-reviewed reports, which served as a ground-truth benchmark. To assess how closely the generated summaries matched these ground-truth summaries, we utilized the Bidirectional Encoder Representation from Transformers Score (BERTScore), which calculates a similarity score based on semantic equivalence instead of exact word matching since the medical summaries did not adhere to a standardized template. Results: Our proposed model was tested on 12 patients and achieved a 100% accuracy rate in identifying and predicting primary tumor staging, nodal involvement, and overall risk categorization. Due to the concurrent architecture, inference times for each patient were processed in under 30 seconds. The recall, precision, and F1 scores of the generated summaries evaluated through BERTScore demonstrated an average of 89% across all metrics, indicating a high similarity and close agreement with physician-prepared summaries. Conclusion: The PAIRS framework presents a robust and efficient means of automating risk classification among post-operative head and neck cancer patients. Our results highlight its potential to significantly enhance clinical workflow efficiency. Additionally, there is further potential for this framework to automate the compilation and generation of oncologic history documents.