DOI
https://doi.org/10.25772/SHWC-G011
Defense Date
2020
Document Type
Thesis
Degree Name
Master of Science
Department
Computer Science
First Advisor
Bridget McInnes
Abstract
One of the primary challenges for clinical Named Entity Recognition (NER) is the availability of annotated training data. Technical and legal hurdles prevent the creation and release of corpora related to electronic health records (EHRs). In this work, we look at the imapct of pseudo-data generation on clinical NER using gazetteering and thresholding utilizing a neural network model. We report that gazetteers can result in the inclusion of proper terms with the exclusion of determiners and pronouns in preceding and middle positions. Gazetteers that had higher numbers of terms inclusive to the original dataset had a higher impact. We also report that thresholding results in clear trend lines across the thresholds with some values oscillating around a fixed point at the most confidence points.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
8-4-2020