DOI

https://doi.org/10.25772/2J2B-Q560

Author ORCID Identifier

https://orcid.org/0000-0002-7192-4644

Defense Date

2025

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Biostatistics

First Advisor

Dipankar Bandyopadhyay, Ph.D.

Abstract

The increasing availability of large-scale electronic health records (EHR) in biomedical research presents computational challenges, particularly when applying traditional statistical models that cannot handle the sheer volume and complexity of the data. Motivated by large EHR data derived from the United Network for Organ Sharing (UNOS) kidney transplantation registry, this dissertation addresses some of the challenges posed in modeling time to event competing risks (CR) endpoints. First, we focus on attaining scalability in marginal modeling of clustered competing risks endpoints (subjects within centers) via Cox-type regression on the cumulative incidence function. This is achieved via modification of a forward-backward scan algorithm, which drastically reduces the computational time from $\mathcal{O}(n^2)$ to $\mathcal{O}(n)$. Next, we investigated the paradigm of informative cluster size (ICS), where the size of a study cluster (here, kidney transplant centers) can be potentially correlated to the time to event (here, CR) endpoints, and propose a novel nonparametric test to assess the cluster informativeness. Leveraging modern empirical process theory, we rigorously establish the asymptotic properties of this test. Third, admitting the limitations of the traditional Cox regression framework, we consider a semiparametric weighted least squares accelerated failure time model for both independent and clustered CR data, where the logarithm of the CR failure times linearly relates to a set of covariates. The associated computational bottleneck (due to $n>>p$) was mitigated via a divide-and-combine approach, where estimation was performed (separately) on randomly split subsets while arriving at the final estimate by combining the estimates from the subsets. Finally, to explore the power of modern artificial intelligence techniques to predict CR endpoints, we compare the performance of variety of existing deep neural networks to the traditional Fine and Gray regression model, in terms of predictive gain. The finite sample performances of the aforementioned modeling approaches were assessed via simulation studies, and illustrated via application to the UNOS dataset.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

5-8-2025

Available for download on Tuesday, May 07, 2030

Share

COinS