DOI
https://doi.org/10.25772/NEZN-5577
Author ORCID Identifier
https://orcid.org/0009-0008-9778-2403
Defense Date
2025
Document Type
Thesis
Degree Name
Master of Science
Department
Human and Molecular Genetics
First Advisor
Scott Turner
Second Advisor
Vernell Williamson
Third Advisor
Timothy York
Fourth Advisor
Heather Creswick
Abstract
Clinical trial matching is a critical component of personalized medicine, particularly in the management of hematologic malignancies. At Virginia Commonwealth University (VCU) Health, the Molecular Diagnostics (MDX) Lab produces somatic variant reports and recommends clinical trials based on the presence of clinically significant mutations. However, the current manual trial recommendation process is time-intensive and lacks scalability.
This study introduces a computational framework to streamline and standardize clinical trial matching using natural language processing (NLP) and unsupervised clustering. Trial brief descriptions were analyzed to extract frequent terms, and trials were grouped based on term similarity using joint dimensionality reduction and clustering. Simultaneously, genomic profiles of patients were clustered based on variant data to identify groups with shared molecular features. Iterative Factor Clustering of Binary Variables (i-FCB) was used for both trial and patient clustering, with cluster quality evaluated using the Calinski-Harabasz index and Jaccard distance for stability.
The resulting framework organized 184 clinical trials into 10 clusters based on frequent trial terms, and 257 patients into 3 clusters based on shared genomic features, explaining 73.95% and 91.14% of variance, respectively. These data-driven clusters form the foundation for improving trial-patient matching in the Molecular Diagnostics (MDX) Lab. This study demonstrates the feasibility of scalable, computational clinical trial matching in hematologic oncology and represents the first such integration into the MDX Lab workflow. Future iterations may support faster enrollment and better alignment of patients with molecularly appropriate trials. Additionally, the NLP-derived Document-Term Matrix (DTM) offers a lookup table to streamline term search within previously recommended trials and identify patterns in past trial assignments.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
4-30-2025
Included in
Bioinformatics Commons, Biomedical Informatics Commons, Computational Biology Commons, Hematology Commons, Medical Genetics Commons, Oncology Commons, Pathology Commons