DOI
https://doi.org/10.25772/WWY7-DR92
Author ORCID Identifier
https://orcid.org/0000-0002-5339-4619
Defense Date
2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Pharmaceutical Sciences
First Advisor
Dayanjan S Wijesinghe
Abstract
This dissertation presents a systematic evaluation of PyZoBot, an AI-powered platform for literature- based question answering, using the Retrieval-Augmented Generation Assessment Scores (RAGAS) framework. The study focuses on a subset of 49 cardiology-related questions extracted from the BioASQ benchmark dataset. PyZoBot's performance was assessed across 32 configurations, including standard Retrieval-Augmented Generation (RAG) and GraphRAG pipelines, implemented with both OpenAI-based models (GPT-3.5-Turbo, GPT-4o) and open- source models (LLaMA 3.1, Mistral).
To establish a comparative benchmark, responses generated by PyZoBot were evaluated alongside answers manually written by six PhD students and recent graduates from the pharmacotherapy field, using a curated Zotero library containing BioASQ-referenced documents. The evaluation applied four key RAGAS metrics—faithfulness, answer relevancy, context recall, and context precision—along with a composite harmonic score to determine overall performance.
The findings reveal that 22 PyZoBot configurations surpassed the highest-performing human participant, with the top pipeline (GPT-3.5-Turbo + layout-aware chunking, k=10) achieving a harmonic RAGAS score of 0.6944. Statistical analysis using Kruskal-Wallis and Dunn’s post hoc tests confirmed significant differences across all metrics, especially in faithfulness and time efficiency.
These results validate PyZoBot’s ability to support high-quality biomedical information synthesis and demonstrate the system’s potential to meet or exceed human performance in complex, evidence-based academic tasks.
Rights
© Suad Alshammari, 2025
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
4-24-2025