Defense Date

2025

Document Type

Thesis

Degree Name

Master of Science

Department

Computer Science

First Advisor

Dr. Bridget McInnes

Abstract

With the increasing number of structured and unstructured data, obtaining reliable information effectively has become crucial. In the biomedical domain, extracting information from the scientific papers is crucial in order to stay up-to-date with accurate information, given the increased pace by which new research studies are published. This work focuses on identifying relationships between entities that are extracted from the abstracts and titles of biomedical research papers. In this work, we developed a Retrieval Augmented Generation (RAG) based system to automatically identify relations between biomedical entities. We evaluate multiple open source Large Language Models (LLMs) and the number of examples (shots) required to improve the LLM's results. We evaluate our methods using precision, recall and F-1 scores and compare our approach to traditional deep learning method using DeBERTa with a Convolutional Neural Network (CNN). Our results show that the Qwen-3 Reasoning model with RAG approach at 10 shots obtained the highest macro F1 than baseline approach and other LLMs using RAG at 10 shots. It performed best at 25 shots as there were least number of hallucinated label and high macro and micro score.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

12-12-2025

Share

COinS