DOI
https://doi.org/10.25772/29NS-N351
Author ORCID Identifier
0000-0001-6732-8711
Defense Date
2021
Document Type
Dissertation
Degree Name
Doctor of Philosophy
Department
Biostatistics
First Advisor
Dipankar Bandyopadhyay
Abstract
For analyzing large electronic health records (EHR) with time-to-event endpoints, such as in kidney transplantation, a major challenge is to provide an accurate risk analyses, while accounting for a multitude of epidemiological and statistical complexities. Motivated by a right-censored kidney transplantation EHR dataset derived from the United Network of Organ Sharing (UNOS), this dissertation, through a culmination of two interrelated yet distinctly different projects, focuses on developments of novel statistical procedures and methodologies to address some pressing issues arising in EHR-based research. In the first project, we aim to decouple the causal effects of treatments (here, studying subgroups, such as hepatitis C virus positive/negative donors, and positive/negative recipients) on time to death of kidney transplant recipients due to kidney failure, post transplantation. Analytical complexities, such as heavy censoring, heavy imbalance between treatment groups, and multi-center clustering were handled via a 2-stage formulation, where the first stage involved a multinomial propensity score evaluation (via comparing and contrasting the generalized boosted model and covariate-balancing propensity score method), which gets fed into a semiparametric cure-rate Cox proportional hazard frailty modeling of the failure times (as propensity weights) in the second stage. We show that our proposed method resulted in superior performance of model fit while providing a more informative analyses, compared to existing solutions. In the second project, we account for heavy censoring and informative cluster size scenarios in kidney transplantation EHR data by proposing an inverse cluster weighted accelerated failure time mixture cure model within a generalized estimating equations framework. Accelerated expectation-maximization (EM) algorithms were explored to present a computationally tractable framework for these very large epidemiological databases. An important contribution here is the development of residual-based model diagnostics for our mixture-cure setup. Furthermore, we evaluate the efficiency and robustness of our proposed model via extensive simulation studies and application to the UNOS EHR dataset.
Rights
© The Author
Is Part Of
VCU University Archives
Is Part Of
VCU Theses and Dissertations
Date of Submission
5-13-2021