Defense Date
2025
Document Type
Thesis
Degree Name
Master of Science
Department
Computer Science
First Advisor
Dr. Bridget McInnes
Abstract
With the increasing number of structured and unstructured data, obtaining reliable information effectively has become crucial. In the biomedical domain, extracting information from the scientific papers is crucial in order to stay up-to-date with accurate information, given the increased pace by which new research studies are published. This work focuses on identifying relationships between entities that are extracted from the abstracts and titles of biomedical research papers. In this work, we developed a Retrieval Augmented Generation (RAG) based system to automatically identify relations between biomedical entities. We evaluate multiple open source Large Language Models (LLMs) and the number of examples (shots) required to improve the LLM's results. We evaluate our methods using precision, recall and F-1 scores and compare our approach to traditional deep learning method using DeBERTa with a Convolutional Neural Network (CNN). Our results show that the Qwen-3 Reasoning model with RAG approach at 10 shots obtained the highest macro F1 than baseline approach and other LLMs using RAG at 10 shots. It performed best at 25 shots as there were least number of hallucinated label and high macro and micro score.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
12-12-2025