DOI

https://doi.org/10.25772/29NS-N351

Author ORCID Identifier

0000-0001-6732-8711

Defense Date

2021

Document Type

Dissertation

Degree Name

Doctor of Philosophy

Department

Biostatistics

First Advisor

Dipankar Bandyopadhyay

Abstract

For analyzing large electronic health records (EHR) with time-to-event endpoints, such as in kidney transplantation, a major challenge is to provide an accurate risk analyses, while accounting for a multitude of epidemiological and statistical complexities. Motivated by a right-censored kidney transplantation EHR dataset derived from the United Network of Organ Sharing (UNOS), this dissertation, through a culmination of two interrelated yet distinctly different projects, focuses on developments of novel statistical procedures and methodologies to address some pressing issues arising in EHR-based research. In the first project, we aim to decouple the causal effects of treatments (here, studying subgroups, such as hepatitis C virus positive/negative donors, and positive/negative recipients) on time to death of kidney transplant recipients due to kidney failure, post transplantation. Analytical complexities, such as heavy censoring, heavy imbalance between treatment groups, and multi-center clustering were handled via a 2-stage formulation, where the first stage involved a multinomial propensity score evaluation (via comparing and contrasting the generalized boosted model and covariate-balancing propensity score method), which gets fed into a semiparametric cure-rate Cox proportional hazard frailty modeling of the failure times (as propensity weights) in the second stage. We show that our proposed method resulted in superior performance of model fit while providing a more informative analyses, compared to existing solutions. In the second project, we account for heavy censoring and informative cluster size scenarios in kidney transplantation EHR data by proposing an inverse cluster weighted accelerated failure time mixture cure model within a generalized estimating equations framework. Accelerated expectation-maximization (EM) algorithms were explored to present a computationally tractable framework for these very large epidemiological databases. An important contribution here is the development of residual-based model diagnostics for our mixture-cure setup. Furthermore, we evaluate the efficiency and robustness of our proposed model via extensive simulation studies and application to the UNOS EHR dataset.

Rights

© The Author

Is Part Of

VCU University Archives

Is Part Of

VCU Theses and Dissertations

Date of Submission

5-13-2021

Available for download on Monday, May 11, 2026

Share

COinS