DOI

https://doi.org/10.25772/MVBF-2R57

Defense Date

2025

Document Type

Thesis

Degree Name

Master of Science

Department

Computer Science

First Advisor

Dr. Bridget T. McInnes

Abstract

Information Extraction (IE) is a fundamental task in Natural Language Processing (NLP), involving the identification of structured information from unstructured text. Two core components of IE—Named Entity Recognition (NER) and Relation Extraction (RE)—are widely used to extract key concepts and the relationships between them across various domains. However, the sequential dependency of RE on the output of NER makes it vulnerable to error propagation: inaccuracies in entity recognition can negatively affect downstream relation extraction.

To mitigate this issue, Multitask Learning (MTL) has been proposed as an approach that jointly models NER and RE, aiming to improve overall performance and reduce error propagation between tasks. In this thesis, we explore the application of MTL to the analysis of chemical reaction patents, comparing its performance to traditional single-task learning models. Furthermore, we evaluate two MTL training strategies—fully simultaneous (MT-FS) and interleaved Round Robin (MT-RR)—to determine which yields more accurate and robust results. Evaluation is performed using standard metrics such as precision, recall, and F1 score.

The results show that MTL models outperform single-task models for NER, particularly on sparse or ambiguous entity types such as Temperature and Time. However, RE performance demonstrates no inherent disadvantage between the use of either of the MTL models or the single-task model. These findings suggest that MTL offers notable advantages for NER and can be competitive for RE when task interaction is carefully managed.

Rights

© Adrienne D. Hembrick

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

5-8-2025

Share

COinS