DOI

https://doi.org/10.25772/YM8C-BD27

Author ORCID Identifier

https://orcid.org/0000-0002-5508-9968

Defense Date

2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Computer Science

First Advisor

Preetam Ghosh

Second Advisor

Thang N. Dinh

Third Advisor

Kostadin Damevski

Fourth Advisor

William C. Sleeman IV

Fifth Advisor

Rishabh Kapoor

Abstract

Electronic Health Records (EHRs) constitute a comprehensive and high-dimensional repository of clinical data, encompassing a wide array of patient-level information such as diagnoses, procedures, medications, laboratory results, and unstructured clinical narratives. These data hold immense potential for advancing predictive modeling in healthcare, including tasks such as disease progression modeling, hospital readmission prediction, and length of stay (LoS) estimation. However, the intrinsic complexity of EHR data—manifested in its heterogeneity, sparsity, and temporal dynamics—poses significant analytical challenges that limit the generalizability and interpretability of conventional machine learning models. Recent methodological advancements in deep learning and graph-based learning, particularly Graph Neural Networks (GNNs), have opened new avenues for representing and analyzing the multifaceted structure of EHR data. In this dissertation, we propose a set of enhancements to the Patient Similarity Graphs (PSGs) framework, a graph-based paradigm for modeling patient-patient relationships by integrating heterogeneous clinical information. Our contributions span multiple dimensions of representation learning and predictive modeling, targeting the improvement of clinical outcome predictions across various healthcare applications. We begin by benchmarking traditional machine learning models on tabular EHR representations, followed by the development of homogeneous PSGs and their use in GNN-based models for LoS prediction. Subsequently, we introduce a PageRank-based similarity framework for diagnosis prediction in both static and temporal contexts, leveraging sequential visit data. Furthermore, we conduct a comprehensive investigation into link prediction methodologies within biological networks, analyzing their effectiveness in relation to network assortativity. Finally, we evaluate multiple data fusion strategies to construct both patient-level and visit-level similarity graphs for enhanced temporal and static prediction tasks. By integrating structured and unstructured features into unified similarity graphs, and by exploring the temporal evolution of patient trajectories, our work demonstrates the efficacy of graph-based approaches in modeling complex EHR datasets. The proposed enhancements to the PSG framework not only improve predictive performance across a range of clinical tasks but also offer a scalable and interpretable methodology for precision medicine and personalized healthcare analytics.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

8-6-2025

Share

COinS