DOI
https://doi.org/10.25772/MVBF-2R57
Defense Date
2025
Document Type
Thesis
Degree Name
Master of Science
Department
Computer Science
First Advisor
Dr. Bridget T. McInnes
Abstract
Information Extraction (IE) is a fundamental task in Natural Language Processing (NLP), involving the identification of structured information from unstructured text. Two core components of IE—Named Entity Recognition (NER) and Relation Extraction (RE)—are widely used to extract key concepts and the relationships between them across various domains. However, the sequential dependency of RE on the output of NER makes it vulnerable to error propagation: inaccuracies in entity recognition can negatively affect downstream relation extraction.
To mitigate this issue, Multitask Learning (MTL) has been proposed as an approach that jointly models NER and RE, aiming to improve overall performance and reduce error propagation between tasks. In this thesis, we explore the application of MTL to the analysis of chemical reaction patents, comparing its performance to traditional single-task learning models. Furthermore, we evaluate two MTL training strategies—fully simultaneous (MT-FS) and interleaved Round Robin (MT-RR)—to determine which yields more accurate and robust results. Evaluation is performed using standard metrics such as precision, recall, and F1 score.
The results show that MTL models outperform single-task models for NER, particularly on sparse or ambiguous entity types such as Temperature and Time. However, RE performance demonstrates no inherent disadvantage between the use of either of the MTL models or the single-task model. These findings suggest that MTL offers notable advantages for NER and can be competitive for RE when task interaction is carefully managed.
Rights
© Adrienne D. Hembrick
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
5-8-2025