DOI
https://doi.org/10.25772/2J2B-Q560
Author ORCID Identifier
https://orcid.org/0000-0002-7192-4644
Defense Date
2025
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Biostatistics
First Advisor
Dipankar Bandyopadhyay, Ph.D.
Abstract
The increasing availability of large-scale electronic health records (EHR) in biomedical research presents computational challenges, particularly when applying traditional statistical models that cannot handle the sheer volume and complexity of the data. Motivated by large EHR data derived from the United Network for Organ Sharing (UNOS) kidney transplantation registry, this dissertation addresses some of the challenges posed in modeling time to event competing risks (CR) endpoints. First, we focus on attaining scalability in marginal modeling of clustered competing risks endpoints (subjects within centers) via Cox-type regression on the cumulative incidence function. This is achieved via modification of a forward-backward scan algorithm, which drastically reduces the computational time from $\mathcal{O}(n^2)$ to $\mathcal{O}(n)$. Next, we investigated the paradigm of informative cluster size (ICS), where the size of a study cluster (here, kidney transplant centers) can be potentially correlated to the time to event (here, CR) endpoints, and propose a novel nonparametric test to assess the cluster informativeness. Leveraging modern empirical process theory, we rigorously establish the asymptotic properties of this test. Third, admitting the limitations of the traditional Cox regression framework, we consider a semiparametric weighted least squares accelerated failure time model for both independent and clustered CR data, where the logarithm of the CR failure times linearly relates to a set of covariates. The associated computational bottleneck (due to $n>>p$) was mitigated via a divide-and-combine approach, where estimation was performed (separately) on randomly split subsets while arriving at the final estimate by combining the estimates from the subsets. Finally, to explore the power of modern artificial intelligence techniques to predict CR endpoints, we compare the performance of variety of existing deep neural networks to the traditional Fine and Gray regression model, in terms of predictive gain. The finite sample performances of the aforementioned modeling approaches were assessed via simulation studies, and illustrated via application to the UNOS dataset.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
5-8-2025