MACHINE LEARNING MODELS LEVERAGING PATIENT-SIMILARITY AND CLINICAL TEMPORALITY FOR DISEASE PROGNOSES
DOI
https://doi.org/10.25772/YM8C-BD27
Author ORCID Identifier
https://orcid.org/0000-0002-5508-9968
Defense Date
2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Computer Science
First Advisor
Preetam Ghosh
Second Advisor
Thang N. Dinh
Third Advisor
Kostadin Damevski
Fourth Advisor
William C. Sleeman IV
Fifth Advisor
Rishabh Kapoor
Abstract
Electronic Health Records (EHRs) constitute a comprehensive and high-dimensional repository of clinical data, encompassing a wide array of patient-level information such as diagnoses, procedures, medications, laboratory results, and unstructured clinical narratives. These data hold immense potential for advancing predictive modeling in healthcare, including tasks such as disease progression modeling, hospital readmission prediction, and length of stay (LoS) estimation. However, the intrinsic complexity of EHR data—manifested in its heterogeneity, sparsity, and temporal dynamics—poses significant analytical challenges that limit the generalizability and interpretability of conventional machine learning models. Recent methodological advancements in deep learning and graph-based learning, particularly Graph Neural Networks (GNNs), have opened new avenues for representing and analyzing the multifaceted structure of EHR data. In this dissertation, we propose a set of enhancements to the Patient Similarity Graphs (PSGs) framework, a graph-based paradigm for modeling patient-patient relationships by integrating heterogeneous clinical information. Our contributions span multiple dimensions of representation learning and predictive modeling, targeting the improvement of clinical outcome predictions across various healthcare applications. We begin by benchmarking traditional machine learning models on tabular EHR representations, followed by the development of homogeneous PSGs and their use in GNN-based models for LoS prediction. Subsequently, we introduce a PageRank-based similarity framework for diagnosis prediction in both static and temporal contexts, leveraging sequential visit data. Furthermore, we conduct a comprehensive investigation into link prediction methodologies within biological networks, analyzing their effectiveness in relation to network assortativity. Finally, we evaluate multiple data fusion strategies to construct both patient-level and visit-level similarity graphs for enhanced temporal and static prediction tasks. By integrating structured and unstructured features into unified similarity graphs, and by exploring the temporal evolution of patient trajectories, our work demonstrates the efficacy of graph-based approaches in modeling complex EHR datasets. The proposed enhancements to the PSG framework not only improve predictive performance across a range of clinical tasks but also offer a scalable and interpretable methodology for precision medicine and personalized healthcare analytics.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
8-6-2025
Included in
Artificial Intelligence and Robotics Commons, Clinical Trials Commons, Data Science Commons, Vital and Health Statistics Commons